From wolfgang.meyer at gmail.com  Tue Jan  1 12:33:41 2008
From: wolfgang.meyer at gmail.com (Wolfgang Meyer)
Date: Tue, 1 Jan 2008 18:33:41 +0100
Subject: [BioPython] residue sequence number length (no more than 4 digits)
Message-ID: <d38070360801010933y5bf2f2b4h2b8193bebbaaa97e@mail.gmail.com>

Hi,

According to PDB format (old), residue sequence number length should be no
longer than 4 digits.

...
23 - 26    Integer     resSeq    Residue sequence number.
...

However, Bio.PDB.Residue.__init__(...) does not check the length of this
parameter, neither does Bio.PDB.PDBIO. Though Bio.PDB.PDBIO tries to
restrict the length of residue sequence number to 4 in the format string:

_ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c
%8.3f%8.3f%8.3f%6.2f%6.2f      %4s%2s%2s\n"

This does not prevent a residue sequence number longer than 4 digits to be
written into a PDB file by PDBIO. Such a PDB file would be considered false
by many PDB file parsers.

Of course users should be responsible to feed residue sequence number of
valid length to a residue. However, wouldn't it be better to handle some
careless input of wrong residue sequence number in BioPython?

Thanks!
-- 
Wolfgang Meyer

From hlapp at gmx.net  Tue Jan  1 18:25:39 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 1 Jan 2008 18:25:39 -0500
Subject: [BioPython] [BioSQL-l] Authority in biodatabase table
In-Reply-To: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com>
References: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com>
Message-ID: <A25B9456-748B-4664-A37F-217C31B70260@gmx.net>

(Sorry for this long-too-late reply. Going through old email that got  
left unread or unresponded.)

Peter - you probably implemented something meanwhile that suits your  
needs. Just FYI, BioPerl leaves this empty too. The general notion  
for authority is that of the LSID authority field, but of course you  
won't be able to parse this out of any input file. The value for  
SwissProt would be uniprot.org, for example. For NCBI, I'm not sure -  
NCBI hasn't ever issued any LSIDs, but presumably it would be  
something like ncbi.nlm.nih.gov.

	-hilmar

On Nov 26, 2007, at 2:10 PM, Peter wrote:

> Thank's for all the replies on the db_xref issue.
>
> Today I'd like to ask if there are any established guidelines for the
> biodatabase table - in particular for how to use the "authority" field
> in the biodatabase table, and if there is any agreed terminology for
> the named "sub databases" defined therein i.e. what should I call them
> in our documentation.
>
> By default, unless the user specifies an authority, we end up with a
> NULL when creating entries in the biodatabase table using Biopython.
> For example:
>
>> from BioSQL import BioSeqDatabase
> server = BioSeqDatabase.open_database(driver="MySQLdb", user="root",
>                      passwd = "", host = "localhost", db="bioseqdb")
> db = server.new_database("orchids", description="Just for testing")
> server.adaptor.commit()
>
> I'd like to give some sensible defaults in any worked examples.  Apart
>> from simple test cases (like above), sensible examples that came to
> mind would be creating a "sub database" to contain:
> (*) an entire GenBank release
> (*) the latest SwissProt release
>
> What would you use in these cases.  In fact, what does your
> biodatabase table contain right now?
>
> Thank you all,
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lee.byung-chul at kaist.ac.kr  Wed Jan  2 06:00:37 2008
From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul)
Date: Wed, 02 Jan 2008 20:00:37 +0900
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format
Message-ID: <477B6ED5.8080005@kaist.ac.kr>


Dear colleagues.

I want to use the AlignInfo.SummaryInfo for fasta-format alignment file.
I think that to do the process firstly the fasta format should be
converted to clustalw format, so I try to use Formatconverter.
However, at my trial, I cannot do that.
I did like below:

----
#!/usr/bin/env python

from Bio import Fasta
from Bio.Align.FormatConvert import FormatConverter
from Bio.Alphabet import IUPAC

alignment = Fasta.FastaAlign.parse_file('tmp.fasta',type='PROTEIN')
converter = FormatConverter(alignment)
clw_align = converter.to_clustal()

print clw_align
----
and tmp.fasta is
---
>seq2
DAC
>seq3
DC-
>seq1
DAD
>seq4
DDD

But error occured.
error messages are below:
---
Traceback (most recent call last):
File "tmp.py", line 7, in <module>
alignment = Fasta.FastaAlign.parse_file('tmp.fasta', type='PROTEIN')
File "/var/lib/python-support/python2.5/Bio/Fasta/FastaAlign.py", line
48, in parse_file
cur_align = iterator.next()
File "/var/lib/python-support/python2.5/Bio/Fasta/__init__.py", line 72,
in next
result = self._iterator.next()
File "/var/lib/python-support/python2.5/Martel/IterParser.py", line 152,
in iterateFile
self.header_parser.parseString(rec)
File "/var/lib/python-support/python2.5/Martel/Parser.py", line 356, in
parseString
self._err_handler.fatalError(result)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/handler.py", line
38, in fatalError
raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond
character 0
-----
What should I do? Could you advide me ?

Thank you!

Byung chul Lee

From biopython at maubp.freeserve.co.uk  Wed Jan  2 06:54:34 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jan 2008 11:54:34 +0000
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW
	format
In-Reply-To: <477B6ED5.8080005@kaist.ac.kr>
References: <477B6ED5.8080005@kaist.ac.kr>
Message-ID: <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>

Hello Byung chul Lee,

On 1/2/08, Lee,Byung-chul wrote:
>
> Dear colleagues.
>
> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file.
> I think that to do the process firstly the fasta format should be
> converted to clustalw format, so I try to use Formatconverter.
> However, at my trial, I cannot do that.

Once you have an alignment object (loaded from any file format), this
should work with AlignInfo.  I don't think you need to convert it from
FASTA to ClustalW.

I would guess the error you saw is a problem with Biopython/Martel and
mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0.
What version of Biopython are you using, as I would have expected this
to work fine with Biopython 1.44?

You could also try using Bio.SeqIO to load the FASTA format alignment
file instead, see http://biopython.org/wiki/SeqIO

from Bio import SeqIO
from Bio.Align import AlignInfo
alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
summary_align = AlignInfo.SummaryInfo(alignment)

Peter

From biopython at maubp.freeserve.co.uk  Wed Jan  2 06:57:46 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jan 2008 11:57:46 +0000
Subject: [BioPython] [BioSQL-l] Authority in biodatabase table
In-Reply-To: <A25B9456-748B-4664-A37F-217C31B70260@gmx.net>
References: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com>
	<A25B9456-748B-4664-A37F-217C31B70260@gmx.net>
Message-ID: <320fb6e00801020357g724917b5s853d99f2f953753a@mail.gmail.com>

On 1/1/08, Hilmar Lapp <hlapp at gmx.net> wrote:
> (Sorry for this long-too-late reply. Going through old email that got
> left unread or unresponded.)
>
> Peter - you probably implemented something meanwhile that suits your
> needs. Just FYI, BioPerl leaves this empty too. The general notion
> for authority is that of the LSID authority field, but of course you
> won't be able to parse this out of any input file. The value for
> SwissProt would be uniprot.org, for example. For NCBI, I'm not sure -
> NCBI hasn't ever issued any LSIDs, but presumably it would be
> something like ncbi.nlm.nih.gov.
>
>        -hilmar

Thank you Hilmar.

It seem's that the current code in Biopython is fine (the authority
field is left blank by default, unless the user supplies their own
value), and consistent with both BioPerl and BioJava in this regard
(thanks Richard).

Peter

From lee.byung-chul at kaist.ac.kr  Wed Jan  2 08:44:47 2008
From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul)
Date: Wed, 02 Jan 2008 22:44:47 +0900
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW
	format
In-Reply-To: <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>
References: <477B6ED5.8080005@kaist.ac.kr>
	<320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>
Message-ID: <477B954F.9020004@kaist.ac.kr>


Thank you very much for your kind reply, Peter.

As your explanation, I tried to use SeqIO, but another error occured
I did it like below:

-----------------
from Bio import SeqIO
from Bio.Align import AlignInfo
alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
--------------------
but the results are
-----------------
Traceback (most recent call last):
  File "tmp.py", line 16, in <module>
    print summary_align.dumb_consensus()
  File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 111, in dumb_consensus
    consensus_alpha = self._guess_consensus_alphabet()
  File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 189, in _guess_consensus_alphabet
    ("Non-gapped alphabet found in alignment object.")
ValueError: Non-gapped alphabet found in alignment object.
---------------------
In addition, all sequences have the same lenghth in my tmp.fasta file.
-----
>seq2
DAC
>seq3 
DC-
>seq1 
DAD
>seq4
DDD

Is this problem caused by the Biopython/Martel and mxTextTools vesions?
I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1.

What should I do for this? Thanks.

Byung chul.

Peter wrote:
> Hello Byung chul Lee,
>
> On 1/2/08, Lee,Byung-chul wrote:
>   
>> Dear colleagues.
>>
>> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file.
>> I think that to do the process firstly the fasta format should be
>> converted to clustalw format, so I try to use Formatconverter.
>> However, at my trial, I cannot do that.
>>     
>
> Once you have an alignment object (loaded from any file format), this
> should work with AlignInfo.  I don't think you need to convert it from
> FASTA to ClustalW.
>
> I would guess the error you saw is a problem with Biopython/Martel and
> mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0.
> What version of Biopython are you using, as I would have expected this
> to work fine with Biopython 1.44?
>
> You could also try using Bio.SeqIO to load the FASTA format alignment
> file instead, see http://biopython.org/wiki/SeqIO
>
> from Bio import SeqIO
> from Bio.Align import AlignInfo
> alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
> summary_align = AlignInfo.SummaryInfo(alignment)
>
> Peter
>
>   


From biopython at maubp.freeserve.co.uk  Wed Jan  2 12:46:25 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jan 2008 17:46:25 +0000
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW
	format
In-Reply-To: <477B954F.9020004@kaist.ac.kr>
References: <477B6ED5.8080005@kaist.ac.kr>
	<320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>
	<477B954F.9020004@kaist.ac.kr>
Message-ID: <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com>

On Jan 2, 2008 1:44 PM, Lee,Byung-chul <lee.byung-chul at kaist.ac.kr> wrote:
> As your explanation, I tried to use SeqIO, but another error occured
> I did it like below:

My fault, sorry. I wasn't at a computer with Biopython installed, I
had to guess.  I'll try and put together a proper example for you
tomorrow.

> Is this problem caused by the Biopython/Martel and mxTextTools vesions?
> I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1.

The original problem you reported was due to the combination of
Biopython 1.43 (the Martel module) and mxTextTools 3.0. You can either
update to Biopython 1.44 or downgrade to mxTextTools 2.0 - neither is
going to be very simple if you want to use the Ubuntu repositories.
To avoid this Martel problem, I would suggest you un-install Biopython
1.43 from the Ubuntu repository, and then install Biopython 1.44 from
source.

Peter

From biopython at maubp.freeserve.co.uk  Fri Jan  4 08:20:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Jan 2008 13:20:26 +0000
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW
	format
In-Reply-To: <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com>
References: <477B6ED5.8080005@kaist.ac.kr>
	<320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>
	<477B954F.9020004@kaist.ac.kr>
	<320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com>
Message-ID: <320fb6e00801040520i11c9a4c4q4449cee34da00706@mail.gmail.com>

On Jan 2, 2008 5:46 PM, Peter wrote:
> On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote:
> > As your explanation, I tried to use SeqIO, but another error occured
> > I did it like below:
>
> My fault, sorry. I wasn't at a computer with Biopython installed, I
> had to guess.  I'll try and put together a proper example for you
> tomorrow.

This should work on Biopython 1.43 or later, I have tested it using
the simple FASTA file you gave earlier:

from Bio.Alphabet.IUPAC import IUPACProtein
from Bio.Alphabet import Gapped
from Bio import SeqIO
from Bio.Align import AlignInfo
gapped_protein = Gapped(IUPACProtein())

records = list(SeqIO.parse(open('tmp.fasta'), "fasta"))
for rec in records :
    #Override the default generic alphabet:
    rec.seq.alphabet = gapped_protein
#Turn these records into an alignment
alignment = SeqIO.to_alignment(records, gapped_protein)
del records

summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

The problem with my previous shorter suggestion was the Bio.SeqIO
FASTA parser returned SeqRecord objects with a generic alphabet, while
the alignment summary expected a gapped alphabet.  I'm beginning to
think that the Bio.SeqIO.parse() function should allow an alphabet to
be specified as an optional argument for this sort of situation.

Alternatively, going back to your original code how about:

from Bio.Fasta import FastaAlign
from Bio.Align import AlignInfo

alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN')
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0.
It should work with older versions of Biopython using mxTextTools 2.0
as well.

Peter

From mjldehoon at yahoo.com  Sat Jan  5 03:41:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Jan 2008 00:41:25 -0800 (PST)
Subject: [BioPython] Bio.Ais
Message-ID: <140129.37367.qm@web62402.mail.re1.yahoo.com>

Hi everybody,

I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). 
Currently, this example script does not seem to work:

$ python example_ais2.py
Traceback (most recent call last):
  File "example_ais2.py", line 39, in <module>
    immune = Immune( align, alphabet, 100 )
...
TypeError: 'int' object is not iterable

The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases.

Does anybody know how to fix this example? If not, what should we do with it?

--Michiel.

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

From meesters at uni-mainz.de  Mon Jan  7 13:13:59 2008
From: meesters at uni-mainz.de (Christian Meesters)
Date: Mon, 7 Jan 2008 19:13:59 +0100
Subject: [BioPython] Bio.PDB - adding 'dummy atoms'
Message-ID: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de>

Hoi,

I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I
have this approach:

new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
residue.add(new)

Here x, y, and z are floating point numbers and serial_number is an
integer. 'residue' is a 'Residue' I'm iterating over. However, I keep
getting the following error message and don't have a clue, how to
proceed:

new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
TypeError: object of type 'module' is not callable

Does anyone have a hint for me, how actually add an atom or what's wrong
here?

TIA
Christian


From biopython at maubp.freeserve.co.uk  Mon Jan  7 13:55:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Jan 2008 18:55:57 +0000
Subject: [BioPython] Bio.PDB - adding 'dummy atoms'
In-Reply-To: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de>
References: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de>
Message-ID: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com>

Christian Meesters wrote:
> I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I
> have this approach:
> ...
> new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
> TypeError: object of type 'module' is not callable
>
> Does anyone have a hint for me, how actually add an atom or what's wrong
> here?

I would infer from the error that "Atom" refers to the Bio.PDB.Atom
module, rather than the Bio.PDB.Atom.Atom class.  How did you do your
imports?  Try this:

from Bio.PDB.Atom import Atom

Peter

From lueck at ipk-gatersleben.de  Tue Jan  8 04:06:40 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 8 Jan 2008 10:06:40 +0100
Subject: [BioPython] blastall does not exist at %s" % blastcmd
Message-ID: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de>

Hi!

I'm trying to get a local blast running. I proceeded as described in the cookbook but I allways get this Error message:
>>>
Traceback (most recent call last):
  File "F:\Blast\blast.py", line 10, in <module>
    my_blast_db, my_blast_file)
  File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall
    raise ValueError, "blastall does not exist at %s" % blastcmd
ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
<<<

>>>
My Code:

import Bio
from Bio.Blast import NCBIStandalone
import os

my_blast_db = r"F:\Blast\primerdb"
my_blast_file = "test.fasta"
my_blast_exe = r"C:\Blast\bin\blastall.exe"

result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
my_blast_db, my_blast_file)
blast_results = result_handle.read()
save_file = open("my_blast.xml", "w")
save_file.write(blast_results)
save_file.close()
<<<

blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool.

I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

Does somewone has idea where's the problem?

Greetings 
Stefanie

From biopython at maubp.freeserve.co.uk  Tue Jan  8 05:46:02 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jan 2008 10:46:02 +0000
Subject: [BioPython] blastall does not exist at %s" % blastcmd
In-Reply-To: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de>
References: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de>
Message-ID: <320fb6e00801080246t5aa515ccuc8699134b533e8b9@mail.gmail.com>

On Jan 8, 2008 9:06 AM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hi!
>
> I'm trying to get a local blast running. I proceeded as described in the cookbook
> but I allways get this Error message:
> >>>
> Traceback (most recent call last):
>   File "F:\Blast\blast.py", line 10, in <module>
>     my_blast_db, my_blast_file)
>   File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall
>     raise ValueError, "blastall does not exist at %s" % blastcmd
> ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
> <<<
>
> >>>
> My Code:
>
> import Bio
> from Bio.Blast import NCBIStandalone
> import os
>
> my_blast_db = r"F:\Blast\primerdb"
> my_blast_file = "test.fasta"
> my_blast_exe = r"C:\Blast\bin\blastall.exe"
>
> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
> my_blast_db, my_blast_file)
> ...
> blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool.
>

Could you try this, which is the test done in the Biopython blastall
function that triggers the error message you saw:

print os.path.exists(my_blast_exe)

Could you also double check the path is C:\Blast\bin\blastall.exe and
not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point
on Windows). Also did you install it to the F: drive where your
database is, rather than C: ?

> I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

What version of standalone blast do you have?

Peter


From lueck at ipk-gatersleben.de  Tue Jan  8 06:32:54 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 8 Jan 2008 12:32:54 +0100
Subject: [BioPython] blastall does not exist at %s" % blastcmd
References: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de>
	<320fb6e00801080246t5aa515ccuc8699134b533e8b9@mail.gmail.com>
Message-ID: <003a01c851ea$357e5cd0$1022a8c0@ipkgatersleben.de>

Thanks Peter!

C:\Blast\blastall.exe worked!.
Sorry for the drive mistake, I have it on both...

But my xml File is empty :-(
I'll try to fix it...

standalone blast version is blast-2.2.17-ia32-win32.exe

Stefanie


----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "Stefanie L?ck" <lueck at ipk-gatersleben.de>
Cc: <biopython at lists.open-bio.org>
Sent: Tuesday, January 08, 2008 11:46 AM
Subject: Re: [BioPython] blastall does not exist at %s" % blastcmd


On Jan 8, 2008 9:06 AM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hi!
>
> I'm trying to get a local blast running. I proceeded as described in the 
> cookbook
> but I allways get this Error message:
> >>>
> Traceback (most recent call last):
>   File "F:\Blast\blast.py", line 10, in <module>
>     my_blast_db, my_blast_file)
>   File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 
> 1499, in blastall
>     raise ValueError, "blastall does not exist at %s" % blastcmd
> ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
> <<<
>
> >>>
> My Code:
>
> import Bio
> from Bio.Blast import NCBIStandalone
> import os
>
> my_blast_db = r"F:\Blast\primerdb"
> my_blast_file = "test.fasta"
> my_blast_exe = r"C:\Blast\bin\blastall.exe"
>
> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, 
> "blastn",
> my_blast_db, my_blast_file)
> ...
> blastall.exe is in this folder (checked by os.listdir()) but can't be 
> found from the tool.
>

Could you try this, which is the test done in the Biopython blastall
function that triggers the error message you saw:

print os.path.exists(my_blast_exe)

Could you also double check the path is C:\Blast\bin\blastall.exe and
not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point
on Windows). Also did you install it to the F: drive where your
database is, rather than C: ?

> I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

What version of standalone blast do you have?

Peter


From lueck at ipk-gatersleben.de  Tue Jan  8 09:18:08 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 8 Jan 2008 15:18:08 +0100
Subject: [BioPython] empty xml after local blast
Message-ID: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de>

Hi again!

I got blastall running but my xml output file is empty...
Any ideas? 
Where exactly must be my fasta file?

>>>
Code:
import Bio
from Bio.Blast import NCBIStandalone
import os

my_blast_db = r"C:\Blast\primerdb"
my_blast_file = "test.fasta"
my_blast_exe = r"C:\Blast\blastall.exe"

result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
my_blast_db, my_blast_file)
blast_results = result_handle.read()
save_file = open("my_blast.xml", "w")
save_file.write(blast_results)
save_file.close()
>>>

I'm using Python 2.5, biopython-1.44.win32-py2.5.exe and blast-2.2.17-ia32-win32.exe

Regards 
Stefanie

From biopython at maubp.freeserve.co.uk  Tue Jan  8 09:33:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jan 2008 14:33:29 +0000
Subject: [BioPython] empty xml after local blast
In-Reply-To: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de>
References: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de>
Message-ID: <320fb6e00801080633k652b3023r6a8457b4c97143e0@mail.gmail.com>

On Jan 8, 2008 2:18 PM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hi again!
>
> I got blastall running but my xml output file is empty...
> Any ideas?

Have you ever tried running blastall.exe from the command line "by
hand"?  This can be very useful, and would let you rule out several
basic problems (e.g. make sure blast is installed correctly, and that
your database is working).

> Where exactly must be my fasta file?

Where ever you like - as long as you specify its location correctly.
Your code below seems to assume that "test.fasta" is in the current
directory (i.e. where you are running your python script from).  Is
this correct?

It may be simpler to use a full path, e.g.
my_blast_file = r"C:\temp\test.fasta"

I suspect that Standalone blast is not finding the input file, or that
it is not finding your database.  If you get an empty XML file, one
thing to try is checking the error output from the command line call:

print error_info.read()

Peter


From lueck at ipk-gatersleben.de  Tue Jan  8 10:18:32 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 8 Jan 2008 16:18:32 +0100
Subject: [BioPython] empty xml after local blast
References: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de>
	<320fb6e00801080633k652b3023r6a8457b4c97143e0@mail.gmail.com>
Message-ID: <009d01c85209$bb314210$1022a8c0@ipkgatersleben.de>

Thanks, it's couldn't find the database!
Great help, thanks a lot ;-)

----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "Stefanie L?ck" <lueck at ipk-gatersleben.de>
Cc: <biopython at lists.open-bio.org>
Sent: Tuesday, January 08, 2008 3:33 PM
Subject: Re: [BioPython] empty xml after local blast


On Jan 8, 2008 2:18 PM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hi again!
>
> I got blastall running but my xml output file is empty...
> Any ideas?

Have you ever tried running blastall.exe from the command line "by
hand"?  This can be very useful, and would let you rule out several
basic problems (e.g. make sure blast is installed correctly, and that
your database is working).

> Where exactly must be my fasta file?

Where ever you like - as long as you specify its location correctly.
Your code below seems to assume that "test.fasta" is in the current
directory (i.e. where you are running your python script from).  Is
this correct?

It may be simpler to use a full path, e.g.
my_blast_file = r"C:\temp\test.fasta"

I suspect that Standalone blast is not finding the input file, or that
it is not finding your database.  If you get an empty XML file, one
thing to try is checking the error output from the command line call:

print error_info.read()

Peter


From meesters at uni-mainz.de  Tue Jan  8 11:12:09 2008
From: meesters at uni-mainz.de (Christian Meesters)
Date: Tue, 8 Jan 2008 17:12:09 +0100
Subject: [BioPython] Bio.PDB - adding 'dummy atoms'
In-Reply-To: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com>
References: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de>
	<320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com>
Message-ID: <1199808729.5401.75.camel@meesters.biologie.uni-mainz.de>

> I would infer from the error that "Atom" refers to the Bio.PDB.Atom
> module, rather than the Bio.PDB.Atom.Atom class.  How did you do your
> imports?  Try this:
> 
> from Bio.PDB.Atom import Atom
> 
> Peter
Ouch! Next time I'll try the tutor-list ;-).

Thanks a lot.

Christian


From quantrum75 at yahoo.com  Thu Jan 10 19:16:51 2008
From: quantrum75 at yahoo.com (quantrum75)
Date: Thu, 10 Jan 2008 16:16:51 -0800 (PST)
Subject: [BioPython] bio.PDB module
In-Reply-To: <mailman.1663.1199789171.2774.biopython@lists.open-bio.org>
Message-ID: <258224.6110.qm@web31404.mail.mud.yahoo.com>

Hi 
I am a biopython newbie. I was wondering if someone could show me or send me ( I would be thankful) where I could find a script which can read a pdb file and out the phi and psi angles of the protein structure.
I have read through the bio.PDB module and structural module documentation, but still do not have an idea on how to proceed to tackle the problem. I wish the bio.PDB documentation was a bit more detailed and included some examples to work with. I really would like to contribute to the project and maybe if I got an initial idea on how to work with the same, I can contribute in some small way.
Thanks for your time
Regards
Rama


biopython-request at lists.open-bio.org wrote: Send BioPython mailing list submissions to
 biopython at lists.open-bio.org

To subscribe or unsubscribe via the World Wide Web, visit
 http://lists.open-bio.org/mailman/listinfo/biopython
or, via email, send a message with subject or body 'help' to
 biopython-request at lists.open-bio.org

You can reach the person managing the list at
 biopython-owner at lists.open-bio.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of BioPython digest..."


Today's Topics:

   1. Re: [BioSQL-l] Authority in biodatabase table (Peter)
   2. Re: FormatConverter: from Fasta format to ClustalW format
      (Lee,Byung-chul)
   3. Re: FormatConverter: from Fasta format to ClustalW format (Peter)
   4. Re: FormatConverter: from Fasta format to ClustalW format (Peter)
   5. Bio.Ais (Michiel de Hoon)
   6. Bio.PDB - adding 'dummy atoms' (Christian Meesters)
   7. Re: Bio.PDB - adding 'dummy atoms' (Peter)
   8. blastall does not exist at %s" % blastcmd (Stefanie L?ck)
   9. Re: blastall does not exist at %s" % blastcmd (Peter)


----------------------------------------------------------------------

Message: 1
Date: Wed, 2 Jan 2008 11:57:46 +0000
From: Peter 
Subject: Re: [BioPython] [BioSQL-l] Authority in biodatabase table
To: "Hilmar Lapp" 
Cc: biopython at lists.open-bio.org, biosql-l at lists.open-bio.org
Message-ID:
 <320fb6e00801020357g724917b5s853d99f2f953753a at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On 1/1/08, Hilmar Lapp  wrote:
> (Sorry for this long-too-late reply. Going through old email that got
> left unread or unresponded.)
>
> Peter - you probably implemented something meanwhile that suits your
> needs. Just FYI, BioPerl leaves this empty too. The general notion
> for authority is that of the LSID authority field, but of course you
> won't be able to parse this out of any input file. The value for
> SwissProt would be uniprot.org, for example. For NCBI, I'm not sure -
> NCBI hasn't ever issued any LSIDs, but presumably it would be
> something like ncbi.nlm.nih.gov.
>
>        -hilmar

Thank you Hilmar.

It seem's that the current code in Biopython is fine (the authority
field is left blank by default, unless the user supplies their own
value), and consistent with both BioPerl and BioJava in this regard
(thanks Richard).

Peter


------------------------------

Message: 2
Date: Wed, 02 Jan 2008 22:44:47 +0900
From: "Lee,Byung-chul" 
Subject: Re: [BioPython] FormatConverter: from Fasta format to
 ClustalW format
To: biopython at lists.open-bio.org
Message-ID: <477B954F.9020004 at kaist.ac.kr>
Content-Type: text/plain; charset=EUC-KR


Thank you very much for your kind reply, Peter.

As your explanation, I tried to use SeqIO, but another error occured
I did it like below:

-----------------
from Bio import SeqIO
from Bio.Align import AlignInfo
alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
--------------------
but the results are
-----------------
Traceback (most recent call last):
  File "tmp.py", line 16, in 
    print summary_align.dumb_consensus()
  File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 111, in dumb_consensus
    consensus_alpha = self._guess_consensus_alphabet()
  File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 189, in _guess_consensus_alphabet
    ("Non-gapped alphabet found in alignment object.")
ValueError: Non-gapped alphabet found in alignment object.
---------------------
In addition, all sequences have the same lenghth in my tmp.fasta file.
-----
>seq2
DAC
>seq3 
DC-
>seq1 
DAD
>seq4
DDD

Is this problem caused by the Biopython/Martel and mxTextTools vesions?
I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1.

What should I do for this? Thanks.

Byung chul.

Peter wrote:
> Hello Byung chul Lee,
>
> On 1/2/08, Lee,Byung-chul wrote:
>   
>> Dear colleagues.
>>
>> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file.
>> I think that to do the process firstly the fasta format should be
>> converted to clustalw format, so I try to use Formatconverter.
>> However, at my trial, I cannot do that.
>>     
>
> Once you have an alignment object (loaded from any file format), this
> should work with AlignInfo.  I don't think you need to convert it from
> FASTA to ClustalW.
>
> I would guess the error you saw is a problem with Biopython/Martel and
> mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0.
> What version of Biopython are you using, as I would have expected this
> to work fine with Biopython 1.44?
>
> You could also try using Bio.SeqIO to load the FASTA format alignment
> file instead, see http://biopython.org/wiki/SeqIO
>
> from Bio import SeqIO
> from Bio.Align import AlignInfo
> alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
> summary_align = AlignInfo.SummaryInfo(alignment)
>
> Peter
>
>   


------------------------------

Message: 3
Date: Wed, 2 Jan 2008 17:46:25 +0000
From: Peter 
Subject: Re: [BioPython] FormatConverter: from Fasta format to
 ClustalW format
To: "Lee,Byung-chul" 
Cc: biopython at lists.open-bio.org
Message-ID:
 <320fb6e00801020946j5b331137s14f9e1d90e888a2e at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Jan 2, 2008 1:44 PM, Lee,Byung-chul  wrote:
> As your explanation, I tried to use SeqIO, but another error occured
> I did it like below:

My fault, sorry. I wasn't at a computer with Biopython installed, I
had to guess.  I'll try and put together a proper example for you
tomorrow.

> Is this problem caused by the Biopython/Martel and mxTextTools vesions?
> I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1.

The original problem you reported was due to the combination of
Biopython 1.43 (the Martel module) and mxTextTools 3.0. You can either
update to Biopython 1.44 or downgrade to mxTextTools 2.0 - neither is
going to be very simple if you want to use the Ubuntu repositories.
To avoid this Martel problem, I would suggest you un-install Biopython
1.43 from the Ubuntu repository, and then install Biopython 1.44 from
source.

Peter


------------------------------

Message: 4
Date: Fri, 4 Jan 2008 13:20:26 +0000
From: Peter 
Subject: Re: [BioPython] FormatConverter: from Fasta format to
 ClustalW format
To: "Lee,Byung-chul" 
Cc: biopython at lists.open-bio.org
Message-ID:
 <320fb6e00801040520i11c9a4c4q4449cee34da00706 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Jan 2, 2008 5:46 PM, Peter wrote:
> On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote:
> > As your explanation, I tried to use SeqIO, but another error occured
> > I did it like below:
>
> My fault, sorry. I wasn't at a computer with Biopython installed, I
> had to guess.  I'll try and put together a proper example for you
> tomorrow.

This should work on Biopython 1.43 or later, I have tested it using
the simple FASTA file you gave earlier:

from Bio.Alphabet.IUPAC import IUPACProtein
from Bio.Alphabet import Gapped
from Bio import SeqIO
from Bio.Align import AlignInfo
gapped_protein = Gapped(IUPACProtein())

records = list(SeqIO.parse(open('tmp.fasta'), "fasta"))
for rec in records :
    #Override the default generic alphabet:
    rec.seq.alphabet = gapped_protein
#Turn these records into an alignment
alignment = SeqIO.to_alignment(records, gapped_protein)
del records

summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

The problem with my previous shorter suggestion was the Bio.SeqIO
FASTA parser returned SeqRecord objects with a generic alphabet, while
the alignment summary expected a gapped alphabet.  I'm beginning to
think that the Bio.SeqIO.parse() function should allow an alphabet to
be specified as an optional argument for this sort of situation.

Alternatively, going back to your original code how about:

from Bio.Fasta import FastaAlign
from Bio.Align import AlignInfo

alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN')
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0.
It should work with older versions of Biopython using mxTextTools 2.0
as well.

Peter


------------------------------

Message: 5
Date: Sat, 5 Jan 2008 00:41:25 -0800 (PST)
From: Michiel de Hoon 
Subject: [BioPython] Bio.Ais
To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org
Message-ID: <140129.37367.qm at web62402.mail.re1.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1

Hi everybody,

I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). 
Currently, this example script does not seem to work:

$ python example_ais2.py
Traceback (most recent call last):
  File "example_ais2.py", line 39, in 
    immune = Immune( align, alphabet, 100 )
...
TypeError: 'int' object is not iterable

The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases.

Does anybody know how to fix this example? If not, what should we do with it?

--Michiel.

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


------------------------------

Message: 6
Date: Mon, 7 Jan 2008 19:13:59 +0100
From: Christian Meesters 
Subject: [BioPython] Bio.PDB - adding 'dummy atoms'
To: "biopython at lists.open-bio.org" 
Message-ID: <1199729639.13152.20.camel at meesters.biologie.uni-mainz.de>
Content-Type: text/plain

Hoi,

I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I
have this approach:

new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
residue.add(new)

Here x, y, and z are floating point numbers and serial_number is an
integer. 'residue' is a 'Residue' I'm iterating over. However, I keep
getting the following error message and don't have a clue, how to
proceed:

new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
TypeError: object of type 'module' is not callable

Does anyone have a hint for me, how actually add an atom or what's wrong
here?

TIA
Christian


------------------------------

Message: 7
Date: Mon, 7 Jan 2008 18:55:57 +0000
From: Peter 
Subject: Re: [BioPython] Bio.PDB - adding 'dummy atoms'
To: "Christian Meesters" 
Cc: "biopython at lists.open-bio.org" 
Message-ID:
 <320fb6e00801071055n6bcb936dr58e96ac87b6e509d at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Christian Meesters wrote:
> I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I
> have this approach:
> ...
> new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
> TypeError: object of type 'module' is not callable
>
> Does anyone have a hint for me, how actually add an atom or what's wrong
> here?

I would infer from the error that "Atom" refers to the Bio.PDB.Atom
module, rather than the Bio.PDB.Atom.Atom class.  How did you do your
imports?  Try this:

from Bio.PDB.Atom import Atom

Peter


------------------------------

Message: 8
Date: Tue, 8 Jan 2008 10:06:40 +0100
From: Stefanie L?ck 
Subject: [BioPython] blastall does not exist at %s" % blastcmd
To: 
Message-ID: <002301c851d5$c7daac60$1022a8c0 at ipkgatersleben.de>
Content-Type: text/plain; charset="iso-8859-1"

Hi!

I'm trying to get a local blast running. I proceeded as described in the cookbook but I allways get this Error message:
>>>
Traceback (most recent call last):
  File "F:\Blast\blast.py", line 10, in 
    my_blast_db, my_blast_file)
  File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall
    raise ValueError, "blastall does not exist at %s" % blastcmd
ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
<<<

>>>
My Code:

import Bio
from Bio.Blast import NCBIStandalone
import os

my_blast_db = r"F:\Blast\primerdb"
my_blast_file = "test.fasta"
my_blast_exe = r"C:\Blast\bin\blastall.exe"

result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
my_blast_db, my_blast_file)
blast_results = result_handle.read()
save_file = open("my_blast.xml", "w")
save_file.write(blast_results)
save_file.close()
<<<

blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool.

I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

Does somewone has idea where's the problem?

Greetings 
Stefanie


------------------------------

Message: 9
Date: Tue, 8 Jan 2008 10:46:02 +0000
From: Peter 
Subject: Re: [BioPython] blastall does not exist at %s" % blastcmd
To: " Stefanie L?ck " 
Cc: biopython at lists.open-bio.org
Message-ID:
 <320fb6e00801080246t5aa515ccuc8699134b533e8b9 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Jan 8, 2008 9:06 AM, Stefanie L?ck  wrote:
> Hi!
>
> I'm trying to get a local blast running. I proceeded as described in the cookbook
> but I allways get this Error message:
> >>>
> Traceback (most recent call last):
>   File "F:\Blast\blast.py", line 10, in 
>     my_blast_db, my_blast_file)
>   File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall
>     raise ValueError, "blastall does not exist at %s" % blastcmd
> ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
> <<<
>
> >>>
> My Code:
>
> import Bio
> from Bio.Blast import NCBIStandalone
> import os
>
> my_blast_db = r"F:\Blast\primerdb"
> my_blast_file = "test.fasta"
> my_blast_exe = r"C:\Blast\bin\blastall.exe"
>
> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
> my_blast_db, my_blast_file)
> ...
> blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool.
>

Could you try this, which is the test done in the Biopython blastall
function that triggers the error message you saw:

print os.path.exists(my_blast_exe)

Could you also double check the path is C:\Blast\bin\blastall.exe and
not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point
on Windows). Also did you install it to the F: drive where your
database is, rather than C: ?

> I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

What version of standalone blast do you have?

Peter


------------------------------

_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


End of BioPython Digest, Vol 61, Issue 2
****************************************


---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

From lee.byung-chul at kaist.ac.kr  Thu Jan 10 22:15:02 2008
From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul)
Date: Fri, 11 Jan 2008 12:15:02 +0900
Subject: [BioPython] bio.PDB module
In-Reply-To: <258224.6110.qm@web31404.mail.mud.yahoo.com>
References: <258224.6110.qm@web31404.mail.mud.yahoo.com>
Message-ID: <4786DF36.6070102@kaist.ac.kr>


quantrum75 wrote:

> > Hi 
> > I am a biopython newbie. I was wondering if someone could show me or send me ( I would be thankful) where I could find a script which can read a pdb file and out the phi and psi angles of the protein structure.
> > I have read through the bio.PDB module and structural module documentation, but still do not have an idea on how to proceed to tackle the problem. I wish the bio.PDB documentation was a bit more detailed and included some examples to work with. I really would like to contribute to the project and maybe if I got an initial idea on how to work with the same, I can contribute in some small way.
> > Thanks for your time
> > Regards
> > Rama
> >   
>   
I think the web page below can help you. Check it.
:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/

Byung chul.


From mjldehoon at yahoo.com  Fri Jan 11 06:16:45 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 11 Jan 2008 03:16:45 -0800 (PST)
Subject: [BioPython] [Biopython-dev] Bio.Ais
In-Reply-To: <140129.37367.qm@web62402.mail.re1.yahoo.com>
Message-ID: <426295.9925.qm@web62415.mail.re1.yahoo.com>

Looking at this again, currently we have no documentation for Bio.Ais, no maintainer, and no apparent users (at least, I couldn't find any in the mailing list archives). Would anybody mind very much if I mark this module as deprecated?
Just to find out if there are any users of this code out there.

--Michiel.

Michiel de Hoon <mjldehoon at yahoo.com> wrote: Hi everybody,

I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). 
Currently, this example script does not seem to work:

$ python example_ais2.py
Traceback (most recent call last):
  File "example_ais2.py", line 39, in 
    immune = Immune( align, alphabet, 100 )
...
TypeError: 'int' object is not iterable

The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases.

Does anybody know how to fix this example? If not, what should we do with it?

--Michiel.

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

From biopython at maubp.freeserve.co.uk  Fri Jan 11 06:51:41 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 11 Jan 2008 11:51:41 +0000
Subject: [BioPython] bio.PDB module
In-Reply-To: <258224.6110.qm@web31404.mail.mud.yahoo.com>
References: <mailman.1663.1199789171.2774.biopython@lists.open-bio.org>
	<258224.6110.qm@web31404.mail.mud.yahoo.com>
Message-ID: <320fb6e00801110351x204102fft44dd3b1e914bfee3@mail.gmail.com>

On Jan 11, 2008 12:16 AM, quantrum75 <quantrum75 at yahoo.com> wrote:
> Hi
> I am a biopython newbie. I was wondering if someone could show me or send me
> ( I would be thankful) where I could find a script which can read a pdb file and out
> the phi and psi angles of the protein structure.

I see Byung chul has already suggested reading this page:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/

Do you think we should incorporate some that into the main Biopython
documentation?

> I have read through the bio.PDB module and structural module documentation,
> but still do not have an idea on how to proceed to tackle the problem. I wish the
> bio.PDB documentation was a bit more detailed and included some examples to
> work with.

Have you read the Biopython Structural Bioinformatics FAQ,
http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf
This is linked to from our documentation webpage, but doesn't seem to
me mentioned in the main Biopython Tutorial and Cookbook...

> I really would like to contribute to the project and maybe if I got an
> initial idea on how to work with the same, I can contribute in some small way.

Maybe you could start a "Getting started with Bio.PDB" page on the Wiki?

Peter

From quantrum75 at yahoo.com  Fri Jan 11 08:26:23 2008
From: quantrum75 at yahoo.com (quantrum75)
Date: Fri, 11 Jan 2008 05:26:23 -0800 (PST)
Subject: [BioPython] bio.PDB module
In-Reply-To: <320fb6e00801110351x204102fft44dd3b1e914bfee3@mail.gmail.com>
Message-ID: <496455.11121.qm@web31409.mail.mud.yahoo.com>

Hi Peter,
  Thanks for your reply. I did go through the links which you made a mention of to me including the structural bioinformatics FAQ.  However, I feel the documentation pertaining to bio.PDB module is seriously short on any practical examples for a person like me who likes to learn from examples.
  I would love to be able to write a "getting started with bio.PDB" wiki or document with examples. However, I need to get the basic ideas on how to use the module which I am unable to from the current documentation which is why I made the request for a script which can compute the phi and psi angle of a pdb file.
  I ll see what I can do and if you could direct to any resources, that would be great.
  Thanks
  Rama

Peter <biopython at maubp.freeserve.co.uk> wrote:
  On Jan 11, 2008 12:16 AM, quantrum75 wrote:
> Hi
> I am a biopython newbie. I was wondering if someone could show me or send me
> ( I would be thankful) where I could find a script which can read a pdb file and out
> the phi and psi angles of the protein structure.

I see Byung chul has already suggested reading this page:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/

Do you think we should incorporate some that into the main Biopython
documentation?

> I have read through the bio.PDB module and structural module documentation,
> but still do not have an idea on how to proceed to tackle the problem. I wish the
> bio.PDB documentation was a bit more detailed and included some examples to
> work with.

Have you read the Biopython Structural Bioinformatics FAQ,
http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf
This is linked to from our documentation webpage, but doesn't seem to
me mentioned in the main Biopython Tutorial and Cookbook...

> I really would like to contribute to the project and maybe if I got an
> initial idea on how to work with the same, I can contribute in some small way.

Maybe you could start a "Getting started with Bio.PDB" page on the Wiki?

Peter


---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

From jdieten at gmail.com  Tue Jan 15 08:26:08 2008
From: jdieten at gmail.com (Joost van Dieten)
Date: Tue, 15 Jan 2008 14:26:08 +0100
Subject: [BioPython] [Biopython] Blast problem
Message-ID: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com>

Hi Everyone,

I'am having a problem with the hsp.match function in the Bio-Python Blast
module.
A few weeks ago the hsp.match returned me the following:

ATGGCA++TGG

But now it gives me:

ATGGCA TGG

I can't see the number of gaps anymore, anyone a solution for this?

Best regards,

Joost van Dieten

From biopython at maubp.freeserve.co.uk  Tue Jan 15 09:09:47 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Jan 2008 14:09:47 +0000
Subject: [BioPython] [Biopython] Blast problem
In-Reply-To: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com>
References: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com>
Message-ID: <320fb6e00801150609y16c77bdch927dd6d9689996a5@mail.gmail.com>

Hi Joost,

> Iam having a problem with the hsp.match function in the Bio-Python Blast
> module. A few weeks ago the hsp.match returned me the following:
>
> ATGGCA++TGG
>
> But now it gives me:
>
> ATGGCA TGG
>
> I can't see the number of gaps anymore, anyone a solution for this?

Are you using the online version of blast with Biopython?  Perhaps the
NCBI changed something.  Are you parsing the XML output or the plain
text?  Can you provide any more information (e.g. which version of
Biopython).

Thanks

Peter

From luca.beltrame at unimi.it  Thu Jan 17 09:12:55 2008
From: luca.beltrame at unimi.it (Luca Beltrame)
Date: Thu, 17 Jan 2008 15:12:55 +0100
Subject: [BioPython] KEGG Gene parser?
Message-ID: <200801171512.55898.luca.beltrame@unimi.it>

Hello.

I'd like to know if there is a parser that can parse KEGG gene entries. As far 
as I can see, Bio.KEGG can only do Compound and Enzyme. 
Should there be the need I'm thinking about writing one, but since in 2004 
someone had posted something (now no longer available), I'm asking the list 
first.

Thanks.

From lueck at ipk-gatersleben.de  Mon Jan 21 06:21:52 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Mon, 21 Jan 2008 12:21:52 +0100
Subject: [BioPython] blastall questions (output, full length subject)
Message-ID: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de>

Hi!

 
I need again some advice for a local blast with blastall.

 
First of all, everything works fine, I just have some questions on how to continue:

 
1) How can I see the full length of the subject? I always can see only this part, which is matching with the query.

 
2) How are your suggestions to continue with the xml output? I want to sort the Hits by % of matching and my idea was it to put everything in a dictionary (%match as key and all the rest information's as values).

Is this the right way?


Greetings

Stefanie

 
From winter at biotec.tu-dresden.de  Mon Jan 21 08:18:15 2008
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Mon, 21 Jan 2008 14:18:15 +0100
Subject: [BioPython] blastall questions (output, full length subject)
In-Reply-To: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de>
References: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de>
Message-ID: <47949B97.90205@biotec.tu-dresden.de>

Stefanie L?ck wrote:
> Hi!
> 
> I need again some advice for a local blast with blastall.
> 
> First of all, everything works fine, I just have some questions on how to
> continue:
> 
> 1) How can I see the full length of the subject? I always can see only this
> part, which is matching with the query.

Hi Stefanie,

you suffered from the slightly confusing naming in the BioPython NCBIXML class.
Here is an explanation:

alignment.length = total length of unaligned hit sequence
record.query_letters = length of query sequence
len(hsp.query) = len(hsp.match) = len(hsp.sbjct) = length of alignment

with

parser = NCBIXML.BlastParser()
records = parser.parse(open(blast_results_file))

for record in records:
     for alignment in record.alignments:
         for hsp in alignment.hsps:
             # do s.th.

> 2) How are your suggestions to continue with the xml output? I want to sort
> the Hits by % of matching and my idea was it to put everything in a
> dictionary (%match as key and all the rest information's as values).

If you refer to the sequence identity percentage, you can use
sequenceIdentity = int(hsp.identities)*100/int(len(hsp.query))

To use the sequence identity as key in a dictionary, you would have to keep a 
list (or set) of records as value, since different records (hits) can have the 
same sequence identity.

I would recommend to just keep a set (or list) of records, and use the key or 
cmp parameter of Python's sort function to sort by one field of the record:
http://wiki.python.org/moin/HowTo/Sorting

If you only need some information of the record, it might be even easier to 
store this information in a list, and keep a set (or list) of these lists.

HTH,
Christof

PS: Maybe we could enrich NCBIXML.py for some more meaningful variables?

> 
> Is this the right way?
> 
> 
> 
> Greetings
> 
> Stefanie
> 
> 
> 
> _______________________________________________ BioPython mailing list  -
> BioPython at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/biopython

From biopython at maubp.freeserve.co.uk  Mon Jan 21 10:15:45 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Jan 2008 15:15:45 +0000
Subject: [BioPython] KEGG Gene parser?
In-Reply-To: <200801171512.55898.luca.beltrame@unimi.it>
References: <200801171512.55898.luca.beltrame@unimi.it>
Message-ID: <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com>

On Jan 17, 2008 Luca Beltrame wrote:
> Hello.
>
> I'd like to know if there is a parser that can parse KEGG gene entries. As far
> as I can see, Bio.KEGG can only do Compound and Enzyme.

And there is also Bio.KEGG.Map, but you are right, there doesn't seem
to be anything for KEGG gene entries.

> Should there be the need I'm thinking about writing one, but since in 2004
> someone had posted something (now no longer available), I'm asking the list
> first.

It looks no-else is working on any KEGG code, so if you still want to
write something it could be be useful. Are you happy to write this in
a similar style to the existing Bio.KEGG modules, and put together
some basic documentation and a test case too?

Peter

From jkhilmer at gmail.com  Tue Jan 22 14:41:07 2008
From: jkhilmer at gmail.com (Jonathan Hilmer)
Date: Tue, 22 Jan 2008 12:41:07 -0700
Subject: [BioPython] KEGG Gene parser?
In-Reply-To: <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com>
References: <200801171512.55898.luca.beltrame@unimi.it>
	<320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com>
Message-ID: <81277ce10801221141ya4f0d3fr87858102274d6e2e@mail.gmail.com>

Luca,

My lab also has interest in KEGG gene entries.  Although I have
minimal experience in professional Python programming, I would be
happy to help in any way: perhaps testing etc.


Jonathan Hilmer
Bothner Research Group
Montana State University


On Jan 21, 2008 8:15 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Jan 17, 2008 Luca Beltrame wrote:
> > Hello.
> >
> > I'd like to know if there is a parser that can parse KEGG gene entries. As far
> > as I can see, Bio.KEGG can only do Compound and Enzyme.
>
> And there is also Bio.KEGG.Map, but you are right, there doesn't seem
> to be anything for KEGG gene entries.
>
> > Should there be the need I'm thinking about writing one, but since in 2004
> > someone had posted something (now no longer available), I'm asking the list
> > first.
>
> It looks no-else is working on any KEGG code, so if you still want to
> write something it could be be useful. Are you happy to write this in
> a similar style to the existing Bio.KEGG modules, and put together
> some basic documentation and a test case too?
>
> Peter
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From bsantos at biocant.pt  Wed Jan 23 12:55:18 2008
From: bsantos at biocant.pt (Bruno Santos)
Date: Wed, 23 Jan 2008 17:55:18 +0000
Subject: [BioPython] Problems runing BLAST
Message-ID: <20080123175518.eab8a089@mail.biocant.pt>

Hi
I use to run blastall without any problems, but now I have moved all my scripts to a server runing Fedora Core 6 and now I get the folowing error when parsing the blast results:

Traceback (most recent call last):
  File "/usr/local/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 568, in parse
    raise SyntaxError("Your XML file did not start <?xml...")
SyntaxError: Your XML file did not start <?xml...

I'm runing blastall version 2.2.16. 
And my code looks like this:
my_blast_file = "file.fasta"
my_blast_exe = "/usr/local/bin/blastall"
result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file, expectation)  
blast_records = NCBIXML.parse(result_handle)

From sbassi at gmail.com  Wed Jan 23 14:40:25 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Wed, 23 Jan 2008 17:40:25 -0200
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <20080123175518.eab8a089@mail.biocant.pt>
References: <20080123175518.eab8a089@mail.biocant.pt>
Message-ID: <b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>

On Jan 23, 2008 3:55 PM, Bruno Santos <bsantos at biocant.pt> wrote:
>     raise SyntaxError("Your XML file did not start <?xml...")
> SyntaxError: Your XML file did not start <?xml...

Can you show us the result of:
head your_xml_file.xml

From biopython at maubp.freeserve.co.uk  Wed Jan 23 16:07:07 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 23 Jan 2008 21:07:07 +0000
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>
References: <20080123175518.eab8a089@mail.biocant.pt>
	<b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>
Message-ID: <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com>

On 1/23/08, Sebastian Bassi <sbassi at gmail.com> wrote:
> On Jan 23, 2008 3:55 PM, Bruno Santos <bsantos at biocant.pt> wrote:
> >     raise SyntaxError("Your XML file did not start <?xml...")
> > SyntaxError: Your XML file did not start <?xml...
>
> Can you show us the result of:
> head your_xml_file.xml

Seeing the start of the XML file would be very helpful.  And if is
empty, what has been written to the error handle?  I would guess maybe
the database is in a new location or something simple like that...

print error_info.read()

Another thing to check is the version of Biopython on the new machine.
 Earlier versions would default to asking blast for plain text output
instead of XML.

Peter

From bsantos at biocant.pt  Fri Jan 25 07:15:56 2008
From: bsantos at biocant.pt (Bruno Santos)
Date: Fri, 25 Jan 2008 12:15:56 -0000
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com>
References: <20080123175518.eab8a089@mail.biocant.pt>	
	<b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>
	<320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com>
Message-ID: <000301c85f4c$0bd830d0$23889270$@pt>

I wasn't using any XML file as intermediate, I was parsing the blast results
directly. But it was really a problem with the databases. Now it's solved.

My question now is another one, I'm blasting a multifasta file, so I need to
know which results belongs to which query sequence ID. I Know I can simply
assume that the blast result is ordered according to the sequences in the
fasta file, but is any other away to obtain the query ID directly using the
Blast Record class?

Thanks in advance,
Bruno Santos

-----Mensagem original-----
De: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] Em nome de
Peter
Enviada: quarta-feira, 23 de Janeiro de 2008 21:07
Para: Sebastian Bassi
Cc: Bruno Santos; biopython at biopython.org
Assunto: Re: [BioPython] Problems runing BLAST

On 1/23/08, Sebastian Bassi <sbassi at gmail.com> wrote:
> On Jan 23, 2008 3:55 PM, Bruno Santos <bsantos at biocant.pt> wrote:
> >     raise SyntaxError("Your XML file did not start <?xml...")
> > SyntaxError: Your XML file did not start <?xml...
>
> Can you show us the result of:
> head your_xml_file.xml

Seeing the start of the XML file would be very helpful.  And if is
empty, what has been written to the error handle?  I would guess maybe
the database is in a new location or something simple like that...

print error_info.read()

Another thing to check is the version of Biopython on the new machine.
 Earlier versions would default to asking blast for plain text output
instead of XML.

Peter


From winter at biotec.tu-dresden.de  Fri Jan 25 08:02:06 2008
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Fri, 25 Jan 2008 14:02:06 +0100
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <000301c85f4c$0bd830d0$23889270$@pt>
References: <20080123175518.eab8a089@mail.biocant.pt>		<b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>	<320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com>
	<000301c85f4c$0bd830d0$23889270$@pt>
Message-ID: <4799DDCE.1030205@biotec.tu-dresden.de>

Bruno Santos wrote:
> I wasn't using any XML file as intermediate, I was parsing the blast results
> directly. But it was really a problem with the databases. Now it's solved.
> 
> My question now is another one, I'm blasting a multifasta file, so I need to
> know which results belongs to which query sequence ID. I Know I can simply
> assume that the blast result is ordered according to the sequences in the
> fasta file, but is any other away to obtain the query ID directly using the
> Blast Record class?

record.query?

Try exploring your Blast Record instance on a Python shell with the dir function:

 >>> record
<Bio.Blast.Record.Blast instance at 0xb78341cc>
 >>> dir(record)
['__doc__', '__init__', '__module__', '_num_letters_in_database', 'alignments', 
'application', 'blast_cutoff', 'database', 'database_length', 
'database_letters', 'database_name', 'database_sequences', 'date', 
'descriptions', 'dropoff_1st_pass', 'effective_database_length', 
'effective_hsp_length', 'effective_query_length', 'effective_search_space', 
'effective_search_space_used', 'expect', 'filter', 'frameshift', 
'gap_penalties', 'gap_trigger', 'gap_x_dropoff', 'gap_x_dropoff_final', 
'gapped', 'hsps_gapped', 'hsps_no_gap', 'hsps_prelim_gapped', 
'hsps_prelim_gapped_attemped', 'ka_params', 'ka_params_gap', 'matrix', 
'multiple_alignment', 'num_good_extends', 'num_hits', 'num_letters_in_database', 
'num_seqs_better_e', 'num_sequences', 'num_sequences_in_database', 
'posted_date', 'query', 'query_id', 'query_length', 'query_letters', 
'reference', 'sc_match', 'sc_mismatch', 'threshold', 'version', 'window_size']

Cheers,
Christof

> 
> Thanks in advance,
> Bruno Santos


From mjldehoon at yahoo.com  Fri Jan 25 08:04:38 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 25 Jan 2008 05:04:38 -0800 (PST)
Subject: [BioPython] Bio.EUtils
Message-ID: <8786.65209.qm@web62404.mail.re1.yahoo.com>

Hello everybody,

I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering:
1) Is anybody using Bio.EUtils?
2) If so, could you give an example script that uses Bio.EUtils?
So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others.

Thanks!

--Michiel.


---------------------------------
Never miss a thing.   Make Yahoo your homepage.

From mjldehoon at yahoo.com  Sat Jan 26 00:38:01 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 25 Jan 2008 21:38:01 -0800 (PST)
Subject: [BioPython] Bio.EUtils
In-Reply-To: <d9fd76050801251103lb68260sd0e7d759cdf6b5e5@mail.gmail.com>
Message-ID: <367303.23759.qm@web62406.mail.re1.yahoo.com>

Dear Rohini,
 
 Thank you for your example. It was very helpful.
 Just a few questions about it:
 
 > dbinfo = EUtils.databases['pubmed']
 Is this statement needed? The variable dbinfo is not used in your example, and the example words fine without this statement.
 
 > Then parse the xml or text lines.
  Do you parse the xml or text output yourself, or do you use any Biopython tools for that?
 
 The following does almost the same with Bio.WWW.NCBI instead of Bio.EUtils:
 >>> from Bio.WWW import NCBI
 >>> lines = NCBI.efetch(db='pubmed', id=listids, retmode='xml' ).readlines()
 # or retmode='text'
 I am saying "almost" the same, because currently Bio.WWW.NCBI.efetch does not handle multiple listids (so it accepts listids = '18211820' but not listids = ['18211820', '18211718', '18178374']). However, this can be fixed very easily in Biopython.
 My last question is: Is this sufficient for your needs? Or do you see some advantage to using Bio.EUtils over Bio.WWW.NCBI?
 
 Thanks again,
 
 --Michiel.
 
 
Rohini Damle <rohini.damle at gmail.com> wrote: Hi,
 Here is how I use Bio.Eutils:
  
 from Bio import EUtils
from Bio.EUtils import DBIdsClient
  
 dbinfo = EUtils.databases['pubmed']
 #listids is a list of pubmed ids
 record = DBIdsClient.from_dbids(EUtils.DBIds("pubmed",listids))
 rec2= record.efetch(retmode="xml",rettype=None).readlines()
 # or rec2= record.efetch(retmode="text", rettype="abstract").readlines() if you want to parse the abstract in text format
 Then parse the xml or text lines.
  
 Thanks
 -Rohini.
 

 On Jan 25, 2008 5:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
 Hello everybody,

I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering:
 1) Is anybody using Bio.EUtils?
2) If so, could you give an example script that uses Bio.EUtils?
So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others.

Thanks!

 --Michiel.
 

---------------------------------
Never miss a thing.   Make Yahoo your homepage.
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
 http://lists.open-bio.org/mailman/listinfo/biopython


Rohini Damle <rohini.damle at gmail.com> wrote: Hi,
 Here is how I use Bio.Eutils:
  
 from Bio import EUtils
from Bio.EUtils import DBIdsClient
  
 dbinfo = EUtils.databases['pubmed']
 #listids is a list of pubmed ids
 record = DBIdsClient.from_dbids(EUtils.DBIds("pubmed",listids))
 rec2= record.efetch(retmode="xml",rettype=None).readlines()
 # or rec2= record.efetch(retmode="text", rettype="abstract").readlines() if you want to parse the abstract in text format
 Then parse the xml or text lines.
  
 Thanks
 -Rohini.
 

 On Jan 25, 2008 5:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
 Hello everybody,

I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering:
 1) Is anybody using Bio.EUtils?
2) If so, could you give an example script that uses Bio.EUtils?
So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others.

Thanks!

 --Michiel.
 

---------------------------------
Never miss a thing.   Make Yahoo your homepage.
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
 http://lists.open-bio.org/mailman/listinfo/biopython


---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

From rjalves at igc.gulbenkian.pt  Mon Jan 28 04:58:50 2008
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Mon, 28 Jan 2008 09:58:50 +0000
Subject: [BioPython] Translation issues
Message-ID: <479DA75A.6070804@igc.gulbenkian.pt>

Hi.

I'm trying to automate and validate the process of translation in 
sequences downloaded from NCBI.

Basically I fetch a GenBank file, extract the DNA sequences and use the 
Translation module of BioPython to check if it matches. The problem is 
that the starting aminoacid in NCBI is always M but with the Translation 
module isn't, even if the codon is marked as "starting" in the 
corresponding codon table.

So for instance, the sequence :

"TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA
ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC
GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA
ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA
CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG
AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA
ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA
AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA"

with codon table 11 will translate to:

a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"

while the translation on the GenBank file is:

b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"

causing the test a == b to fail. The sequences are exactly the same with 
the exception of the initial aminoacid

I could do the test in other ways and remove the initial letter, but 
that wouldn't work globally.

So, is this the right behavior or am I missing something?

Any other suggestions to do this test will also help.

Thanks
--
Renato Alves

From biopython at maubp.freeserve.co.uk  Mon Jan 28 05:40:28 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 28 Jan 2008 10:40:28 +0000
Subject: [BioPython] Translation issues
In-Reply-To: <479DA75A.6070804@igc.gulbenkian.pt>
References: <479DA75A.6070804@igc.gulbenkian.pt>
Message-ID: <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com>

On 1/28/08, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
> Hi.
>
> I'm trying to automate and validate the process of translation in
> sequences downloaded from NCBI.  ...  The problem is
> that the starting aminoacid in NCBI is always M but with the Translation
> module isn't, even if the codon is marked as "starting" in the
> corresponding codon table.
>
> So, is this the right behavior or am I missing something?

Sadly, that is the just way the translation module works.  This is a
fairly common problem, and its one I was planning to try and "fix" as
part of Bug 2382
http://bugzilla.open-bio.org/show_bug.cgi?id=2381

I would like some comments on the ideas on that bug - for example
would you prefer separate methods/functions for blind translation,
translation until a stop codon, and translation from a start codon
which is treated as an M - or a single method with lots of optional
arguments?

> Any other suggestions to do this test will also help.

Right now, I would check the start codon yourself and then use an M
when translating the sequence.  Remember the codon table (table 11 in
your example) should have all the valid start codons defined.

Peter

From bsouthey at gmail.com  Mon Jan 28 09:42:22 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 28 Jan 2008 08:42:22 -0600
Subject: [BioPython] Translation issues
In-Reply-To: <479DA75A.6070804@igc.gulbenkian.pt>
References: <479DA75A.6070804@igc.gulbenkian.pt>
Message-ID: <bbcd77d00801280642t72d7126btdf4f105ea11e6e55@mail.gmail.com>

Hi,
Please see:
http://en.wikipedia.org/wiki/Start_codon
"In addition to AUG, alternative start codons, mainly GUG and UUG are
used in prokaryotes. For example E. coli uses 77% ATG (AUG), 14% GTG
(GUG), 8% TTG (UUG) and a few others."

Really the only way is to compare the sequences after the first
position (a[1:]==b[1:]) assuming you expect an exact match.
Alternatively you need to perform some type of alignment and flag
unexpected differences.

Regards
Bruce

On Jan 28, 2008 3:58 AM, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
> Hi.
>
> I'm trying to automate and validate the process of translation in
> sequences downloaded from NCBI.
>
> Basically I fetch a GenBank file, extract the DNA sequences and use the
> Translation module of BioPython to check if it matches. The problem is
> that the starting aminoacid in NCBI is always M but with the Translation
> module isn't, even if the codon is marked as "starting" in the
> corresponding codon table.
>
> So for instance, the sequence :
>
> "TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA
> ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC
> GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA
> ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA
> CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG
> AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA
> ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA
> AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA"
>
> with codon table 11 will translate to:
>
> a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
> LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
> FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"
>
> while the translation on the GenBank file is:
>
> b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
> LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
> FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"
>
> causing the test a == b to fail. The sequences are exactly the same with
> the exception of the initial aminoacid
>
> I could do the test in other ways and remove the initial letter, but
> that wouldn't work globally.
>
> So, is this the right behavior or am I missing something?
>
> Any other suggestions to do this test will also help.
>
> Thanks
> --
> Renato Alves
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From rjalves at igc.gulbenkian.pt  Mon Jan 28 10:37:57 2008
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Mon, 28 Jan 2008 15:37:57 +0000
Subject: [BioPython] Translation issues
In-Reply-To: <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com>
References: <479DA75A.6070804@igc.gulbenkian.pt>
	<320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com>
Message-ID: <479DF6D5.7020709@igc.gulbenkian.pt>

Peter wrote:
> Sadly, that is the just way the translation module works.  This is a
> fairly common problem, and its one I was planning to try and "fix" as
> part of Bug 2382
> http://bugzilla.open-bio.org/show_bug.cgi?id=2381
>   
In this case, I guess that something that tests if the 1st codon is a 
start codon and matches the codon table's start codons, would be 
replaced by "M". But this is a very naive and specific thing. I don't 
know if this could break other uses of this function.
> I would like some comments on the ideas on that bug - for example
> would you prefer separate methods/functions for blind translation,
> translation until a stop codon, and translation from a start codon
> which is treated as an M - or a single method with lots of optional
> arguments?
>   
I don't have the expertise to distinguish the pros and cons between the 
two approaches.
Still, in terms of potential user friendliness, I would go for separate 
methods/functions to keep the task simple and obvious.
> Right now, I would check the start codon yourself and then use an M
> when translating the sequence.  Remember the codon table (table 11 in
> your example) should have all the valid start codons defined.
>   
I'm adopting the technique suggested by Bruce Southey to workaround this 
particular problem. Still this wouldn't work on more elaborate cases 
like some of the ones described on the bug thread you mentioned.

Still, many thanks for the quick and clean answers.

Renato

From rjalves at igc.gulbenkian.pt  Mon Jan 28 12:42:05 2008
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Mon, 28 Jan 2008 17:42:05 +0000
Subject: [BioPython] Alphabet Checking
Message-ID: <479E13ED.2080908@igc.gulbenkian.pt>

/var/lib/python-support/python2.4/Bio/Translate.py in 
translate_to_stop(self, seq)
     34     def translate_to_stop(self, seq):
     35         # This doesn't have a stop encoding
---> 36         assert seq.alphabet == self.table.nucleotide_alphabet, \
     37                "cannot translate from given alphabet (have %s, 
need %s)" %\
     38                (seq.alphabet, self.table.nucleotide_alphabet)

AssertionError: cannot translate from given alphabet (have 
IUPACAmbiguousDNA(), need IUPACAmbiguousDNA())

Aren't those two exactly equal?

Matching references doesn't seem to work as expected :(

What I did:

from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA
from Bio import Translate
from Bio import Seq

a=Seq.Seq("ATCGGATGA...ATGCAGT",alphabet=IUPACAmbiguousDNA())
b=Translate.ambiguous_dna_by_id[11]

b.translate_to_stop(a) ... error pops out

The only way around I was able to find is:

b.table.nucleotide_alphabet=a.alphabet

I guess this is a bad day :( it's the second clash with the Translate 
module in the same day :|

Should I report this as bug?

From p.j.a.cock at googlemail.com  Mon Jan 28 12:56:11 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Jan 2008 17:56:11 +0000
Subject: [BioPython] Alphabet Checking
In-Reply-To: <479E13ED.2080908@igc.gulbenkian.pt>
References: <479E13ED.2080908@igc.gulbenkian.pt>
Message-ID: <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com>

> Aren't those two exactly equal?
>
> Matching references doesn't seem to work as expected :(

That does look like a bug...

> The only way around I was able to find is:

Another option,

from Bio import Translate
from Bio import Seq

trans=Translate.ambiguous_dna_by_id[11]
a=Seq.Seq("ATCGGATGAATGCAGT",alphabet=trans.table.nucleotide_alphabet)
print trans.translate_to_stop(a)
print trans.translate(a)

> I guess this is a bad day :( it's the second clash with the Translate
> module in the same day :|

I don't like the Bio.Translate module either.

> Should I report this as bug?

Please do.  If we do just add translation to the seq object (bug 2381)
and deprecate the Bio.Translate module then in a sense this problem
goes away ;)

Peter

From tiagoantao at gmail.com  Mon Jan 28 13:10:56 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 28 Jan 2008 18:10:56 +0000
Subject: [BioPython] Alphabet Checking
In-Reply-To: <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com>
References: <479E13ED.2080908@igc.gulbenkian.pt>
	<320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com>
Message-ID: <6d941f120801281010r6e8e829dub26a85e6a0b61983@mail.gmail.com>

On Jan 28, 2008 5:56 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > Aren't those two exactly equal?
> >
> > Matching references doesn't seem to work as expected :(
>
> That does look like a bug...

It is probably completely unrelated, but it might not...

>From an "helicopter view" at the code I have noticed that SeqIO uses
Nexus in some cases.

I have patched a previous Nexus bug by using deepcopy, which could
cause something like this:

AssertionError: cannot translate from given alphabet (have
IUPACAmbiguousDNA(), need IUPACAmbiguousDNA())

(ie, it has the same type name, but is really not the same object)

Again, it is probably unrelated (I know very little about Bio.Seq and
Bio.SeqIO), but, just in case...

From mjldehoon at yahoo.com  Mon Jan 28 19:43:22 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 28 Jan 2008 16:43:22 -0800 (PST)
Subject: [BioPython] Bio.EUtils
In-Reply-To: <d9fd76050801281054q1ba7a498yc54f2ca24d3f8d5d@mail.gmail.com>
Message-ID: <356164.74184.qm@web62403.mail.re1.yahoo.com>


Rohini Damle <rohini.damle at gmail.com> wrote: The following does almost the same with Bio.WWW.NCBI instead of Bio.EUtils:
...
My last question is: Is this sufficient for your needs? Or do you see some advantage to using Bio.EUtils over Bio.WWW.NCBI?  I guess Bio.EUtils is faster, can be used for batch-processing (like fetching records for a list of pubmed ids) . I have not tried Bio.WWW.NCBI , will try it and get back to you.

If you make the following modification in Bio.WWW.NCBI.py:
line 189: replace 
    options = urllib.urlencode(params)
by
    options = urllib.urlencode(params, doseq=1)
then Bio.WWW.NCBI can also fetch records for a list of pubmed ids. I'm guessing that then it is as fast as (or faster than) Bio.EUtils, but I'd be interested in what you find in practice.

Thanks,

--Michiel


---------------------------------
Never miss a thing.   Make Yahoo your homepage.

From bsantos at biocant.pt  Tue Jan 29 06:34:07 2008
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 29 Jan 2008 11:34:07 -0000
Subject: [BioPython] Problems runing BLAST
Message-ID: <000101c8626a$dd98e760$98cab620$@pt>

I am once more having problems running blast using biopython. I start the
script the blastall process starts and after a few minutes it starts
sleeping and no error message is passed. When I check the xml file it only
writes part of the results for the first sequence. 
Does anyone has ever had the same problem?

I'm using:
python 2.5.1
biopython 1.44
blastall 2.2.16

My code is the following:

from Bio import SeqIO
from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML
import time
import math
import time
import os


primer = 'D2'
sample = 'AGC'

#Defines all the databases that will be used
my_blast_db = ('\"/home/bsantos/DataBases/nt.00
/home/bsantos/DataBases/nt.01 /home/bsantos/DataBases/nt.02
/home/bsantos/DataBases/nt.03 /home/bsantos/DataBases/nt.04
/home/bsantos/DataBases/nt.05 /home/bsantos/DataBases/RDPIIdb
/home/bsantos/DataBases/RNADB\"')
print my_blast_db


#Define the fasta file to Blast
destination = '/home/bsantos/Metagenomics/Results/' + sample + '/' + primer
+ '/filteredfile_sample' + sample + '_' + primer + '_F.fasta'
my_blast_file = (destination)

#Defines the blast binaries
my_blast_exe = "/usr/local/bin/blastall"

print (os.path.exists(my_blast_exe))
print time.ctime()

#Performs Blast
print 'Now Performing Blast'
result_handle, error_info = NCBIStandalone.blastall(my_blast_exe,
"blastn",my_blast_db, my_blast_file)
print 'This errors have occured:'
print error_info.read()
print 'Starting parsing the results.......'
#Parse the result of the blast in XML format
blast_results = result_handle.read() #Catch the results
save_file = open('/home/bsantos/Metagenomics/Results/' + sample + '/' +
primer + '/BlastReport_sample' + sample + '_' + primer + '_F.xml', 'w')
save_file.write(blast_results) #Write all the information to an XML file
save_file.flush()
save_file.close()


From biopython at maubp.freeserve.co.uk  Tue Jan 29 07:15:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 29 Jan 2008 12:15:26 +0000
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <000101c8626a$dd98e760$98cab620$@pt>
References: <000101c8626a$dd98e760$98cab620$@pt>
Message-ID: <320fb6e00801290415g10e099dj108ecea15a72109c@mail.gmail.com>

Hi Bruno,

On Jan 29, 2008 11:34 AM, Bruno Santos <bsantos at biocant.pt> wrote:
> I am once more having problems running blast using biopython. I start the
> script the blastall process starts and after a few minutes it starts
> sleeping and no error message is passed. When I check the xml file it only
> writes part of the results for the first sequence.

Have you tried running the same command "by hand" at the command line,
to check that is works, and time how long you should expect it to
take?

> Does anyone has ever had the same problem?

I think the problem is to do with asking the operating system to read
all the error output.  Try commenting out this bit, and only read the
error handle if you have a problem:

# print error_info.read()

Quoting from the tutorial,

>> The error info can be hard to deal with, because if you try to do a
error_handle.read() and
>> there was no error info returned, then the read() call will block
and not return, locking your
>> script. In my opinion, the best way to deal with the error is only
to print it out if you are not
>> getting result_handle results to be parsed, but otherwise to leave it alone.

Peter

From jblanca at btc.upv.es  Wed Jan 30 04:15:49 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Wed, 30 Jan 2008 10:15:49 +0100
Subject: [BioPython] blast parse
Message-ID: <200801301015.50812.jblanca@btc.upv.es>

Hi:
I'm new on the list and on biopython. I come from perl and I'm liking python a 
lot.
I'm trying to read a big blast file and it takes a lot o time and memory. I'm 
not sure if I'm taking the most efficient path. Basically I'm doing:

blasth = file('blast.xml', 'r')
from Bio.Blast import NCBIXML
p = NCBIXML.BlastParser()
blast_parse = p.parse(blasth)
for blast_result in blast_parse:
	#do whatever

I was expecting to read the records one by one, but the call to 
p.parse(blasth) takes a lot of time and memory. I'm not sure about what this 
function returns, a list or an iterator. I've looked at the NCBIXML.py file 
and the BlastParser class has two parse methods (am I wrong?).

    def parse(self, handler):
        """Parses the XML data

        handler -- file handler or StringIO

        This method returns a list of Blast record objects.
        """

def parse(handle, debug=0):
    """Returns an iterator a Blast record for each query.

    handle - file handle to and XML file to parse
    debug - integer, amount of debug information to print

    This is a generator function that returns multiple Blast records
    objects - one for each query sequence given to blast.  The file
    is read incrementally, returning complete records as they are read
    in.

I guess that the first function would read the complete file before returning 
anything, but the second should return and read the records one by one. I 
don't know if this guess is correct.
Is there other way to read these huge blast files without using so much 
memory?
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

From mjldehoon at yahoo.com  Wed Jan 30 04:56:56 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 30 Jan 2008 01:56:56 -0800 (PST)
Subject: [BioPython] blast parse
In-Reply-To: <200801301015.50812.jblanca@btc.upv.es>
Message-ID: <940738.9737.qm@web62407.mail.re1.yahoo.com>

Dear Jose,

To get the records one-by-one, use

from Bio.Blast import NCBIXML
blast_parse = NCBIXML.parse(blasth)
for blast_result in blast_parse:
    # do whatever with blast_result

This avoids having to read the complete XML file all at once.

To the developers:
We should probably think about removing the NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read exactly one record from the XML file.

--Michiel.

Jose Blanca <jblanca at btc.upv.es> wrote: Hi:
I'm new on the list and on biopython. I come from perl and I'm liking python a 
lot.
I'm trying to read a big blast file and it takes a lot o time and memory. I'm 
not sure if I'm taking the most efficient path. Basically I'm doing:

blasth = file('blast.xml', 'r')
from Bio.Blast import NCBIXML
p = NCBIXML.BlastParser()
blast_parse = p.parse(blasth)
for blast_result in blast_parse:
 #do whatever

I was expecting to read the records one by one, but the call to 
p.parse(blasth) takes a lot of time and memory. I'm not sure about what this 
function returns, a list or an iterator. I've looked at the NCBIXML.py file 
and the BlastParser class has two parse methods (am I wrong?).

    def parse(self, handler):
        """Parses the XML data

        handler -- file handler or StringIO

        This method returns a list of Blast record objects.
        """

def parse(handle, debug=0):
    """Returns an iterator a Blast record for each query.

    handle - file handle to and XML file to parse
    debug - integer, amount of debug information to print

    This is a generator function that returns multiple Blast records
    objects - one for each query sequence given to blast.  The file
    is read incrementally, returning complete records as they are read
    in.

I guess that the first function would read the complete file before returning 
anything, but the second should return and read the records one by one. I 
don't know if this guess is correct.
Is there other way to read these huge blast files without using so much 
memory?
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


---------------------------------
Never miss a thing.   Make Yahoo your homepage.

From lueck at ipk-gatersleben.de  Wed Jan 30 05:24:55 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Wed, 30 Jan 2008 11:24:55 +0100
Subject: [BioPython] Clustalw pair wise alignment
Message-ID: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de>

Hi!

 
I working with clustalw and everything works fine. No I have some questions:

 
1) Must the input data be in a file or can it also be in the code (e.g. in a list)?

 
2) Because, I want to do many (up to hundreds) pair wise alignments (short sequences) and I don't want to store each of them in a separate file. 

If I have it in one file, clustalw make a multiple alignment:

 
Match1                              ------CAAGATTTGAGCACCACAGGCAA---
full1                                  ------CAAGATTTGAGCACCACAGGCAACAG
Match0                              AGCCTTCAAGATTTGAGCACCACAG-------
full0                                   AGCCTTCAAGATTTGAGCACCACAG-------

 
whereas Match1 should only align to full1 and so on.

 
Could someone give a hint?


Regards 

Stefanie

 
From biopython at maubp.freeserve.co.uk  Wed Jan 30 06:47:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 30 Jan 2008 11:47:42 +0000
Subject: [BioPython] Clustalw pair wise alignment
In-Reply-To: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de>
References: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de>
Message-ID: <320fb6e00801300347h6f1ec197qc599ec9f2c80bab@mail.gmail.com>

Hi Stefanie

> I working with clustalw and everything works fine. No I have some questions:
>
> 1) Must the input data be in a file or can it also be in the code (e.g. in a list)?

I believe for the Clustalw command line tool, you have to supply the
input data in a file.

> 2) Because, I want to do many (up to hundreds) pair wise
> alignments (short sequences) and I don't want to store
> each of them in a separate file.
>
> If I have it in one file, clustalw make a multiple alignment:

Yes, that is expected for clustalw.

> Could someone give a hint?

If you want to use Clustalw, you could re-use a temporary file for
each pair of sequences (rather than creating hundreds of different
input files).

I would consider using the EMBOSS tools "needle" or "water" for doing
pairwise alignments.  These have the advantage that you can actually
supply the sequence as part of the command line (provided they are not
too long).  See http://emboss.sourceforge.net/apps/ and also
http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html#asis

Peter

From winter at biotec.tu-dresden.de  Wed Jan 30 07:48:34 2008
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Wed, 30 Jan 2008 13:48:34 +0100
Subject: [BioPython] blast parse
In-Reply-To: <940738.9737.qm@web62407.mail.re1.yahoo.com>
References: <940738.9737.qm@web62407.mail.re1.yahoo.com>
Message-ID: <47A07222.9000200@biotec.tu-dresden.de>

Michiel de Hoon wrote:
> Dear Jose,
> 
> To get the records one-by-one, use
> 
> from Bio.Blast import NCBIXML blast_parse = NCBIXML.parse(blasth) for
> blast_result in blast_parse: # do whatever with blast_result
> 
> This avoids having to read the complete XML file all at once.
> 
> To the developers: We should probably think about removing the
> NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read
> exactly one record from the XML file.

I thinks removing NCBIXML.BlastParser.parse is a good idea.
We should keep it simple.

Christof


From wolfgang.meyer at gmail.com  Tue Jan  1 17:33:41 2008
From: wolfgang.meyer at gmail.com (Wolfgang Meyer)
Date: Tue, 1 Jan 2008 18:33:41 +0100
Subject: [BioPython] residue sequence number length (no more than 4 digits)
Message-ID: <d38070360801010933y5bf2f2b4h2b8193bebbaaa97e@mail.gmail.com>

Hi,

According to PDB format (old), residue sequence number length should be no
longer than 4 digits.

...
23 - 26    Integer     resSeq    Residue sequence number.
...

However, Bio.PDB.Residue.__init__(...) does not check the length of this
parameter, neither does Bio.PDB.PDBIO. Though Bio.PDB.PDBIO tries to
restrict the length of residue sequence number to 4 in the format string:

_ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c
%8.3f%8.3f%8.3f%6.2f%6.2f      %4s%2s%2s\n"

This does not prevent a residue sequence number longer than 4 digits to be
written into a PDB file by PDBIO. Such a PDB file would be considered false
by many PDB file parsers.

Of course users should be responsible to feed residue sequence number of
valid length to a residue. However, wouldn't it be better to handle some
careless input of wrong residue sequence number in BioPython?

Thanks!
-- 
Wolfgang Meyer


From hlapp at gmx.net  Tue Jan  1 23:25:39 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 1 Jan 2008 18:25:39 -0500
Subject: [BioPython] [BioSQL-l] Authority in biodatabase table
In-Reply-To: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com>
References: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com>
Message-ID: <A25B9456-748B-4664-A37F-217C31B70260@gmx.net>

(Sorry for this long-too-late reply. Going through old email that got  
left unread or unresponded.)

Peter - you probably implemented something meanwhile that suits your  
needs. Just FYI, BioPerl leaves this empty too. The general notion  
for authority is that of the LSID authority field, but of course you  
won't be able to parse this out of any input file. The value for  
SwissProt would be uniprot.org, for example. For NCBI, I'm not sure -  
NCBI hasn't ever issued any LSIDs, but presumably it would be  
something like ncbi.nlm.nih.gov.

	-hilmar

On Nov 26, 2007, at 2:10 PM, Peter wrote:

> Thank's for all the replies on the db_xref issue.
>
> Today I'd like to ask if there are any established guidelines for the
> biodatabase table - in particular for how to use the "authority" field
> in the biodatabase table, and if there is any agreed terminology for
> the named "sub databases" defined therein i.e. what should I call them
> in our documentation.
>
> By default, unless the user specifies an authority, we end up with a
> NULL when creating entries in the biodatabase table using Biopython.
> For example:
>
>> from BioSQL import BioSeqDatabase
> server = BioSeqDatabase.open_database(driver="MySQLdb", user="root",
>                      passwd = "", host = "localhost", db="bioseqdb")
> db = server.new_database("orchids", description="Just for testing")
> server.adaptor.commit()
>
> I'd like to give some sensible defaults in any worked examples.  Apart
>> from simple test cases (like above), sensible examples that came to
> mind would be creating a "sub database" to contain:
> (*) an entire GenBank release
> (*) the latest SwissProt release
>
> What would you use in these cases.  In fact, what does your
> biodatabase table contain right now?
>
> Thank you all,
>
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lee.byung-chul at kaist.ac.kr  Wed Jan  2 11:00:37 2008
From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul)
Date: Wed, 02 Jan 2008 20:00:37 +0900
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW format
Message-ID: <477B6ED5.8080005@kaist.ac.kr>


Dear colleagues.

I want to use the AlignInfo.SummaryInfo for fasta-format alignment file.
I think that to do the process firstly the fasta format should be
converted to clustalw format, so I try to use Formatconverter.
However, at my trial, I cannot do that.
I did like below:

----
#!/usr/bin/env python

from Bio import Fasta
from Bio.Align.FormatConvert import FormatConverter
from Bio.Alphabet import IUPAC

alignment = Fasta.FastaAlign.parse_file('tmp.fasta',type='PROTEIN')
converter = FormatConverter(alignment)
clw_align = converter.to_clustal()

print clw_align
----
and tmp.fasta is
---
>seq2
DAC
>seq3
DC-
>seq1
DAD
>seq4
DDD

But error occured.
error messages are below:
---
Traceback (most recent call last):
File "tmp.py", line 7, in <module>
alignment = Fasta.FastaAlign.parse_file('tmp.fasta', type='PROTEIN')
File "/var/lib/python-support/python2.5/Bio/Fasta/FastaAlign.py", line
48, in parse_file
cur_align = iterator.next()
File "/var/lib/python-support/python2.5/Bio/Fasta/__init__.py", line 72,
in next
result = self._iterator.next()
File "/var/lib/python-support/python2.5/Martel/IterParser.py", line 152,
in iterateFile
self.header_parser.parseString(rec)
File "/var/lib/python-support/python2.5/Martel/Parser.py", line 356, in
parseString
self._err_handler.fatalError(result)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/handler.py", line
38, in fatalError
raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond
character 0
-----
What should I do? Could you advide me ?

Thank you!

Byung chul Lee


From biopython at maubp.freeserve.co.uk  Wed Jan  2 11:54:34 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jan 2008 11:54:34 +0000
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW
	format
In-Reply-To: <477B6ED5.8080005@kaist.ac.kr>
References: <477B6ED5.8080005@kaist.ac.kr>
Message-ID: <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>

Hello Byung chul Lee,

On 1/2/08, Lee,Byung-chul wrote:
>
> Dear colleagues.
>
> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file.
> I think that to do the process firstly the fasta format should be
> converted to clustalw format, so I try to use Formatconverter.
> However, at my trial, I cannot do that.

Once you have an alignment object (loaded from any file format), this
should work with AlignInfo.  I don't think you need to convert it from
FASTA to ClustalW.

I would guess the error you saw is a problem with Biopython/Martel and
mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0.
What version of Biopython are you using, as I would have expected this
to work fine with Biopython 1.44?

You could also try using Bio.SeqIO to load the FASTA format alignment
file instead, see http://biopython.org/wiki/SeqIO

from Bio import SeqIO
from Bio.Align import AlignInfo
alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
summary_align = AlignInfo.SummaryInfo(alignment)

Peter


From biopython at maubp.freeserve.co.uk  Wed Jan  2 11:57:46 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jan 2008 11:57:46 +0000
Subject: [BioPython] [BioSQL-l] Authority in biodatabase table
In-Reply-To: <A25B9456-748B-4664-A37F-217C31B70260@gmx.net>
References: <320fb6e00711261110g63c156a1w8b76a797fe12e2b1@mail.gmail.com>
	<A25B9456-748B-4664-A37F-217C31B70260@gmx.net>
Message-ID: <320fb6e00801020357g724917b5s853d99f2f953753a@mail.gmail.com>

On 1/1/08, Hilmar Lapp <hlapp at gmx.net> wrote:
> (Sorry for this long-too-late reply. Going through old email that got
> left unread or unresponded.)
>
> Peter - you probably implemented something meanwhile that suits your
> needs. Just FYI, BioPerl leaves this empty too. The general notion
> for authority is that of the LSID authority field, but of course you
> won't be able to parse this out of any input file. The value for
> SwissProt would be uniprot.org, for example. For NCBI, I'm not sure -
> NCBI hasn't ever issued any LSIDs, but presumably it would be
> something like ncbi.nlm.nih.gov.
>
>        -hilmar

Thank you Hilmar.

It seem's that the current code in Biopython is fine (the authority
field is left blank by default, unless the user supplies their own
value), and consistent with both BioPerl and BioJava in this regard
(thanks Richard).

Peter


From lee.byung-chul at kaist.ac.kr  Wed Jan  2 13:44:47 2008
From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul)
Date: Wed, 02 Jan 2008 22:44:47 +0900
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW
	format
In-Reply-To: <320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>
References: <477B6ED5.8080005@kaist.ac.kr>
	<320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>
Message-ID: <477B954F.9020004@kaist.ac.kr>


Thank you very much for your kind reply, Peter.

As your explanation, I tried to use SeqIO, but another error occured
I did it like below:

-----------------
from Bio import SeqIO
from Bio.Align import AlignInfo
alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
--------------------
but the results are
-----------------
Traceback (most recent call last):
  File "tmp.py", line 16, in <module>
    print summary_align.dumb_consensus()
  File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 111, in dumb_consensus
    consensus_alpha = self._guess_consensus_alphabet()
  File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 189, in _guess_consensus_alphabet
    ("Non-gapped alphabet found in alignment object.")
ValueError: Non-gapped alphabet found in alignment object.
---------------------
In addition, all sequences have the same lenghth in my tmp.fasta file.
-----
>seq2
DAC
>seq3 
DC-
>seq1 
DAD
>seq4
DDD

Is this problem caused by the Biopython/Martel and mxTextTools vesions?
I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1.

What should I do for this? Thanks.

Byung chul.

Peter wrote:
> Hello Byung chul Lee,
>
> On 1/2/08, Lee,Byung-chul wrote:
>   
>> Dear colleagues.
>>
>> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file.
>> I think that to do the process firstly the fasta format should be
>> converted to clustalw format, so I try to use Formatconverter.
>> However, at my trial, I cannot do that.
>>     
>
> Once you have an alignment object (loaded from any file format), this
> should work with AlignInfo.  I don't think you need to convert it from
> FASTA to ClustalW.
>
> I would guess the error you saw is a problem with Biopython/Martel and
> mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0.
> What version of Biopython are you using, as I would have expected this
> to work fine with Biopython 1.44?
>
> You could also try using Bio.SeqIO to load the FASTA format alignment
> file instead, see http://biopython.org/wiki/SeqIO
>
> from Bio import SeqIO
> from Bio.Align import AlignInfo
> alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
> summary_align = AlignInfo.SummaryInfo(alignment)
>
> Peter
>
>   


From biopython at maubp.freeserve.co.uk  Wed Jan  2 17:46:25 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jan 2008 17:46:25 +0000
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW
	format
In-Reply-To: <477B954F.9020004@kaist.ac.kr>
References: <477B6ED5.8080005@kaist.ac.kr>
	<320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>
	<477B954F.9020004@kaist.ac.kr>
Message-ID: <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com>

On Jan 2, 2008 1:44 PM, Lee,Byung-chul <lee.byung-chul at kaist.ac.kr> wrote:
> As your explanation, I tried to use SeqIO, but another error occured
> I did it like below:

My fault, sorry. I wasn't at a computer with Biopython installed, I
had to guess.  I'll try and put together a proper example for you
tomorrow.

> Is this problem caused by the Biopython/Martel and mxTextTools vesions?
> I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1.

The original problem you reported was due to the combination of
Biopython 1.43 (the Martel module) and mxTextTools 3.0. You can either
update to Biopython 1.44 or downgrade to mxTextTools 2.0 - neither is
going to be very simple if you want to use the Ubuntu repositories.
To avoid this Martel problem, I would suggest you un-install Biopython
1.43 from the Ubuntu repository, and then install Biopython 1.44 from
source.

Peter


From biopython at maubp.freeserve.co.uk  Fri Jan  4 13:20:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Jan 2008 13:20:26 +0000
Subject: [BioPython] FormatConverter: from Fasta format to ClustalW
	format
In-Reply-To: <320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com>
References: <477B6ED5.8080005@kaist.ac.kr>
	<320fb6e00801020354v5d7d9dr42034cdf99a86c03@mail.gmail.com>
	<477B954F.9020004@kaist.ac.kr>
	<320fb6e00801020946j5b331137s14f9e1d90e888a2e@mail.gmail.com>
Message-ID: <320fb6e00801040520i11c9a4c4q4449cee34da00706@mail.gmail.com>

On Jan 2, 2008 5:46 PM, Peter wrote:
> On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote:
> > As your explanation, I tried to use SeqIO, but another error occured
> > I did it like below:
>
> My fault, sorry. I wasn't at a computer with Biopython installed, I
> had to guess.  I'll try and put together a proper example for you
> tomorrow.

This should work on Biopython 1.43 or later, I have tested it using
the simple FASTA file you gave earlier:

from Bio.Alphabet.IUPAC import IUPACProtein
from Bio.Alphabet import Gapped
from Bio import SeqIO
from Bio.Align import AlignInfo
gapped_protein = Gapped(IUPACProtein())

records = list(SeqIO.parse(open('tmp.fasta'), "fasta"))
for rec in records :
    #Override the default generic alphabet:
    rec.seq.alphabet = gapped_protein
#Turn these records into an alignment
alignment = SeqIO.to_alignment(records, gapped_protein)
del records

summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

The problem with my previous shorter suggestion was the Bio.SeqIO
FASTA parser returned SeqRecord objects with a generic alphabet, while
the alignment summary expected a gapped alphabet.  I'm beginning to
think that the Bio.SeqIO.parse() function should allow an alphabet to
be specified as an optional argument for this sort of situation.

Alternatively, going back to your original code how about:

from Bio.Fasta import FastaAlign
from Bio.Align import AlignInfo

alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN')
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0.
It should work with older versions of Biopython using mxTextTools 2.0
as well.

Peter


From mjldehoon at yahoo.com  Sat Jan  5 08:41:25 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Jan 2008 00:41:25 -0800 (PST)
Subject: [BioPython] Bio.Ais
Message-ID: <140129.37367.qm@web62402.mail.re1.yahoo.com>

Hi everybody,

I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). 
Currently, this example script does not seem to work:

$ python example_ais2.py
Traceback (most recent call last):
  File "example_ais2.py", line 39, in <module>
    immune = Immune( align, alphabet, 100 )
...
TypeError: 'int' object is not iterable

The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases.

Does anybody know how to fix this example? If not, what should we do with it?

--Michiel.

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


From meesters at uni-mainz.de  Mon Jan  7 18:13:59 2008
From: meesters at uni-mainz.de (Christian Meesters)
Date: Mon, 7 Jan 2008 19:13:59 +0100
Subject: [BioPython] Bio.PDB - adding 'dummy atoms'
Message-ID: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de>

Hoi,

I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I
have this approach:

new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
residue.add(new)

Here x, y, and z are floating point numbers and serial_number is an
integer. 'residue' is a 'Residue' I'm iterating over. However, I keep
getting the following error message and don't have a clue, how to
proceed:

new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
TypeError: object of type 'module' is not callable

Does anyone have a hint for me, how actually add an atom or what's wrong
here?

TIA
Christian


From biopython at maubp.freeserve.co.uk  Mon Jan  7 18:55:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Jan 2008 18:55:57 +0000
Subject: [BioPython] Bio.PDB - adding 'dummy atoms'
In-Reply-To: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de>
References: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de>
Message-ID: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com>

Christian Meesters wrote:
> I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I
> have this approach:
> ...
> new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
> TypeError: object of type 'module' is not callable
>
> Does anyone have a hint for me, how actually add an atom or what's wrong
> here?

I would infer from the error that "Atom" refers to the Bio.PDB.Atom
module, rather than the Bio.PDB.Atom.Atom class.  How did you do your
imports?  Try this:

from Bio.PDB.Atom import Atom

Peter


From lueck at ipk-gatersleben.de  Tue Jan  8 09:06:40 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 8 Jan 2008 10:06:40 +0100
Subject: [BioPython] blastall does not exist at %s" % blastcmd
Message-ID: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de>

Hi!

I'm trying to get a local blast running. I proceeded as described in the cookbook but I allways get this Error message:
>>>
Traceback (most recent call last):
  File "F:\Blast\blast.py", line 10, in <module>
    my_blast_db, my_blast_file)
  File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall
    raise ValueError, "blastall does not exist at %s" % blastcmd
ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
<<<

>>>
My Code:

import Bio
from Bio.Blast import NCBIStandalone
import os

my_blast_db = r"F:\Blast\primerdb"
my_blast_file = "test.fasta"
my_blast_exe = r"C:\Blast\bin\blastall.exe"

result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
my_blast_db, my_blast_file)
blast_results = result_handle.read()
save_file = open("my_blast.xml", "w")
save_file.write(blast_results)
save_file.close()
<<<

blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool.

I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

Does somewone has idea where's the problem?

Greetings 
Stefanie


From biopython at maubp.freeserve.co.uk  Tue Jan  8 10:46:02 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jan 2008 10:46:02 +0000
Subject: [BioPython] blastall does not exist at %s" % blastcmd
In-Reply-To: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de>
References: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de>
Message-ID: <320fb6e00801080246t5aa515ccuc8699134b533e8b9@mail.gmail.com>

On Jan 8, 2008 9:06 AM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hi!
>
> I'm trying to get a local blast running. I proceeded as described in the cookbook
> but I allways get this Error message:
> >>>
> Traceback (most recent call last):
>   File "F:\Blast\blast.py", line 10, in <module>
>     my_blast_db, my_blast_file)
>   File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall
>     raise ValueError, "blastall does not exist at %s" % blastcmd
> ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
> <<<
>
> >>>
> My Code:
>
> import Bio
> from Bio.Blast import NCBIStandalone
> import os
>
> my_blast_db = r"F:\Blast\primerdb"
> my_blast_file = "test.fasta"
> my_blast_exe = r"C:\Blast\bin\blastall.exe"
>
> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
> my_blast_db, my_blast_file)
> ...
> blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool.
>

Could you try this, which is the test done in the Biopython blastall
function that triggers the error message you saw:

print os.path.exists(my_blast_exe)

Could you also double check the path is C:\Blast\bin\blastall.exe and
not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point
on Windows). Also did you install it to the F: drive where your
database is, rather than C: ?

> I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

What version of standalone blast do you have?

Peter


From lueck at ipk-gatersleben.de  Tue Jan  8 11:32:54 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 8 Jan 2008 12:32:54 +0100
Subject: [BioPython] blastall does not exist at %s" % blastcmd
References: <002301c851d5$c7daac60$1022a8c0@ipkgatersleben.de>
	<320fb6e00801080246t5aa515ccuc8699134b533e8b9@mail.gmail.com>
Message-ID: <003a01c851ea$357e5cd0$1022a8c0@ipkgatersleben.de>

Thanks Peter!

C:\Blast\blastall.exe worked!.
Sorry for the drive mistake, I have it on both...

But my xml File is empty :-(
I'll try to fix it...

standalone blast version is blast-2.2.17-ia32-win32.exe

Stefanie


----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "Stefanie L?ck" <lueck at ipk-gatersleben.de>
Cc: <biopython at lists.open-bio.org>
Sent: Tuesday, January 08, 2008 11:46 AM
Subject: Re: [BioPython] blastall does not exist at %s" % blastcmd


On Jan 8, 2008 9:06 AM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hi!
>
> I'm trying to get a local blast running. I proceeded as described in the 
> cookbook
> but I allways get this Error message:
> >>>
> Traceback (most recent call last):
>   File "F:\Blast\blast.py", line 10, in <module>
>     my_blast_db, my_blast_file)
>   File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 
> 1499, in blastall
>     raise ValueError, "blastall does not exist at %s" % blastcmd
> ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
> <<<
>
> >>>
> My Code:
>
> import Bio
> from Bio.Blast import NCBIStandalone
> import os
>
> my_blast_db = r"F:\Blast\primerdb"
> my_blast_file = "test.fasta"
> my_blast_exe = r"C:\Blast\bin\blastall.exe"
>
> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, 
> "blastn",
> my_blast_db, my_blast_file)
> ...
> blastall.exe is in this folder (checked by os.listdir()) but can't be 
> found from the tool.
>

Could you try this, which is the test done in the Biopython blastall
function that triggers the error message you saw:

print os.path.exists(my_blast_exe)

Could you also double check the path is C:\Blast\bin\blastall.exe and
not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point
on Windows). Also did you install it to the F: drive where your
database is, rather than C: ?

> I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

What version of standalone blast do you have?

Peter


From lueck at ipk-gatersleben.de  Tue Jan  8 14:18:08 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 8 Jan 2008 15:18:08 +0100
Subject: [BioPython] empty xml after local blast
Message-ID: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de>

Hi again!

I got blastall running but my xml output file is empty...
Any ideas? 
Where exactly must be my fasta file?

>>>
Code:
import Bio
from Bio.Blast import NCBIStandalone
import os

my_blast_db = r"C:\Blast\primerdb"
my_blast_file = "test.fasta"
my_blast_exe = r"C:\Blast\blastall.exe"

result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
my_blast_db, my_blast_file)
blast_results = result_handle.read()
save_file = open("my_blast.xml", "w")
save_file.write(blast_results)
save_file.close()
>>>

I'm using Python 2.5, biopython-1.44.win32-py2.5.exe and blast-2.2.17-ia32-win32.exe

Regards 
Stefanie


From biopython at maubp.freeserve.co.uk  Tue Jan  8 14:33:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jan 2008 14:33:29 +0000
Subject: [BioPython] empty xml after local blast
In-Reply-To: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de>
References: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de>
Message-ID: <320fb6e00801080633k652b3023r6a8457b4c97143e0@mail.gmail.com>

On Jan 8, 2008 2:18 PM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hi again!
>
> I got blastall running but my xml output file is empty...
> Any ideas?

Have you ever tried running blastall.exe from the command line "by
hand"?  This can be very useful, and would let you rule out several
basic problems (e.g. make sure blast is installed correctly, and that
your database is working).

> Where exactly must be my fasta file?

Where ever you like - as long as you specify its location correctly.
Your code below seems to assume that "test.fasta" is in the current
directory (i.e. where you are running your python script from).  Is
this correct?

It may be simpler to use a full path, e.g.
my_blast_file = r"C:\temp\test.fasta"

I suspect that Standalone blast is not finding the input file, or that
it is not finding your database.  If you get an empty XML file, one
thing to try is checking the error output from the command line call:

print error_info.read()

Peter


From lueck at ipk-gatersleben.de  Tue Jan  8 15:18:32 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 8 Jan 2008 16:18:32 +0100
Subject: [BioPython] empty xml after local blast
References: <007e01c85201$4b24b180$1022a8c0@ipkgatersleben.de>
	<320fb6e00801080633k652b3023r6a8457b4c97143e0@mail.gmail.com>
Message-ID: <009d01c85209$bb314210$1022a8c0@ipkgatersleben.de>

Thanks, it's couldn't find the database!
Great help, thanks a lot ;-)

----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "Stefanie L?ck" <lueck at ipk-gatersleben.de>
Cc: <biopython at lists.open-bio.org>
Sent: Tuesday, January 08, 2008 3:33 PM
Subject: Re: [BioPython] empty xml after local blast


On Jan 8, 2008 2:18 PM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hi again!
>
> I got blastall running but my xml output file is empty...
> Any ideas?

Have you ever tried running blastall.exe from the command line "by
hand"?  This can be very useful, and would let you rule out several
basic problems (e.g. make sure blast is installed correctly, and that
your database is working).

> Where exactly must be my fasta file?

Where ever you like - as long as you specify its location correctly.
Your code below seems to assume that "test.fasta" is in the current
directory (i.e. where you are running your python script from).  Is
this correct?

It may be simpler to use a full path, e.g.
my_blast_file = r"C:\temp\test.fasta"

I suspect that Standalone blast is not finding the input file, or that
it is not finding your database.  If you get an empty XML file, one
thing to try is checking the error output from the command line call:

print error_info.read()

Peter


From meesters at uni-mainz.de  Tue Jan  8 16:12:09 2008
From: meesters at uni-mainz.de (Christian Meesters)
Date: Tue, 8 Jan 2008 17:12:09 +0100
Subject: [BioPython] Bio.PDB - adding 'dummy atoms'
In-Reply-To: <320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com>
References: <1199729639.13152.20.camel@meesters.biologie.uni-mainz.de>
	<320fb6e00801071055n6bcb936dr58e96ac87b6e509d@mail.gmail.com>
Message-ID: <1199808729.5401.75.camel@meesters.biologie.uni-mainz.de>

> I would infer from the error that "Atom" refers to the Bio.PDB.Atom
> module, rather than the Bio.PDB.Atom.Atom class.  How did you do your
> imports?  Try this:
> 
> from Bio.PDB.Atom import Atom
> 
> Peter
Ouch! Next time I'll try the tutor-list ;-).

Thanks a lot.

Christian


From quantrum75 at yahoo.com  Fri Jan 11 00:16:51 2008
From: quantrum75 at yahoo.com (quantrum75)
Date: Thu, 10 Jan 2008 16:16:51 -0800 (PST)
Subject: [BioPython] bio.PDB module
In-Reply-To: <mailman.1663.1199789171.2774.biopython@lists.open-bio.org>
Message-ID: <258224.6110.qm@web31404.mail.mud.yahoo.com>

Hi 
I am a biopython newbie. I was wondering if someone could show me or send me ( I would be thankful) where I could find a script which can read a pdb file and out the phi and psi angles of the protein structure.
I have read through the bio.PDB module and structural module documentation, but still do not have an idea on how to proceed to tackle the problem. I wish the bio.PDB documentation was a bit more detailed and included some examples to work with. I really would like to contribute to the project and maybe if I got an initial idea on how to work with the same, I can contribute in some small way.
Thanks for your time
Regards
Rama


biopython-request at lists.open-bio.org wrote: Send BioPython mailing list submissions to
 biopython at lists.open-bio.org

To subscribe or unsubscribe via the World Wide Web, visit
 http://lists.open-bio.org/mailman/listinfo/biopython
or, via email, send a message with subject or body 'help' to
 biopython-request at lists.open-bio.org

You can reach the person managing the list at
 biopython-owner at lists.open-bio.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of BioPython digest..."


Today's Topics:

   1. Re: [BioSQL-l] Authority in biodatabase table (Peter)
   2. Re: FormatConverter: from Fasta format to ClustalW format
      (Lee,Byung-chul)
   3. Re: FormatConverter: from Fasta format to ClustalW format (Peter)
   4. Re: FormatConverter: from Fasta format to ClustalW format (Peter)
   5. Bio.Ais (Michiel de Hoon)
   6. Bio.PDB - adding 'dummy atoms' (Christian Meesters)
   7. Re: Bio.PDB - adding 'dummy atoms' (Peter)
   8. blastall does not exist at %s" % blastcmd (Stefanie L?ck)
   9. Re: blastall does not exist at %s" % blastcmd (Peter)


----------------------------------------------------------------------

Message: 1
Date: Wed, 2 Jan 2008 11:57:46 +0000
From: Peter 
Subject: Re: [BioPython] [BioSQL-l] Authority in biodatabase table
To: "Hilmar Lapp" 
Cc: biopython at lists.open-bio.org, biosql-l at lists.open-bio.org
Message-ID:
 <320fb6e00801020357g724917b5s853d99f2f953753a at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On 1/1/08, Hilmar Lapp  wrote:
> (Sorry for this long-too-late reply. Going through old email that got
> left unread or unresponded.)
>
> Peter - you probably implemented something meanwhile that suits your
> needs. Just FYI, BioPerl leaves this empty too. The general notion
> for authority is that of the LSID authority field, but of course you
> won't be able to parse this out of any input file. The value for
> SwissProt would be uniprot.org, for example. For NCBI, I'm not sure -
> NCBI hasn't ever issued any LSIDs, but presumably it would be
> something like ncbi.nlm.nih.gov.
>
>        -hilmar

Thank you Hilmar.

It seem's that the current code in Biopython is fine (the authority
field is left blank by default, unless the user supplies their own
value), and consistent with both BioPerl and BioJava in this regard
(thanks Richard).

Peter


------------------------------

Message: 2
Date: Wed, 02 Jan 2008 22:44:47 +0900
From: "Lee,Byung-chul" 
Subject: Re: [BioPython] FormatConverter: from Fasta format to
 ClustalW format
To: biopython at lists.open-bio.org
Message-ID: <477B954F.9020004 at kaist.ac.kr>
Content-Type: text/plain; charset=EUC-KR


Thank you very much for your kind reply, Peter.

As your explanation, I tried to use SeqIO, but another error occured
I did it like below:

-----------------
from Bio import SeqIO
from Bio.Align import AlignInfo
alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
--------------------
but the results are
-----------------
Traceback (most recent call last):
  File "tmp.py", line 16, in 
    print summary_align.dumb_consensus()
  File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 111, in dumb_consensus
    consensus_alpha = self._guess_consensus_alphabet()
  File "/var/lib/python-support/python2.5/Bio/Align/AlignInfo.py", line 189, in _guess_consensus_alphabet
    ("Non-gapped alphabet found in alignment object.")
ValueError: Non-gapped alphabet found in alignment object.
---------------------
In addition, all sequences have the same lenghth in my tmp.fasta file.
-----
>seq2
DAC
>seq3 
DC-
>seq1 
DAD
>seq4
DDD

Is this problem caused by the Biopython/Martel and mxTextTools vesions?
I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1.

What should I do for this? Thanks.

Byung chul.

Peter wrote:
> Hello Byung chul Lee,
>
> On 1/2/08, Lee,Byung-chul wrote:
>   
>> Dear colleagues.
>>
>> I want to use the AlignInfo.SummaryInfo for fasta-format alignment file.
>> I think that to do the process firstly the fasta format should be
>> converted to clustalw format, so I try to use Formatconverter.
>> However, at my trial, I cannot do that.
>>     
>
> Once you have an alignment object (loaded from any file format), this
> should work with AlignInfo.  I don't think you need to convert it from
> FASTA to ClustalW.
>
> I would guess the error you saw is a problem with Biopython/Martel and
> mxTextTools 3.0, which isn't 100% compatible with mxTextTools 2.0.
> What version of Biopython are you using, as I would have expected this
> to work fine with Biopython 1.44?
>
> You could also try using Bio.SeqIO to load the FASTA format alignment
> file instead, see http://biopython.org/wiki/SeqIO
>
> from Bio import SeqIO
> from Bio.Align import AlignInfo
> alignment = SeqIO.to_alignment(SeqIO.parse(open('tmp.fasta'), "fasta"))
> summary_align = AlignInfo.SummaryInfo(alignment)
>
> Peter
>
>   


------------------------------

Message: 3
Date: Wed, 2 Jan 2008 17:46:25 +0000
From: Peter 
Subject: Re: [BioPython] FormatConverter: from Fasta format to
 ClustalW format
To: "Lee,Byung-chul" 
Cc: biopython at lists.open-bio.org
Message-ID:
 <320fb6e00801020946j5b331137s14f9e1d90e888a2e at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Jan 2, 2008 1:44 PM, Lee,Byung-chul  wrote:
> As your explanation, I tried to use SeqIO, but another error occured
> I did it like below:

My fault, sorry. I wasn't at a computer with Biopython installed, I
had to guess.  I'll try and put together a proper example for you
tomorrow.

> Is this problem caused by the Biopython/Martel and mxTextTools vesions?
> I am using biopython 1.43-2 (ubuntu version) and mxtexttools 3.0.0-2ubuntu1.

The original problem you reported was due to the combination of
Biopython 1.43 (the Martel module) and mxTextTools 3.0. You can either
update to Biopython 1.44 or downgrade to mxTextTools 2.0 - neither is
going to be very simple if you want to use the Ubuntu repositories.
To avoid this Martel problem, I would suggest you un-install Biopython
1.43 from the Ubuntu repository, and then install Biopython 1.44 from
source.

Peter


------------------------------

Message: 4
Date: Fri, 4 Jan 2008 13:20:26 +0000
From: Peter 
Subject: Re: [BioPython] FormatConverter: from Fasta format to
 ClustalW format
To: "Lee,Byung-chul" 
Cc: biopython at lists.open-bio.org
Message-ID:
 <320fb6e00801040520i11c9a4c4q4449cee34da00706 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Jan 2, 2008 5:46 PM, Peter wrote:
> On Jan 2, 2008 1:44 PM, Lee,Byung-chul wrote:
> > As your explanation, I tried to use SeqIO, but another error occured
> > I did it like below:
>
> My fault, sorry. I wasn't at a computer with Biopython installed, I
> had to guess.  I'll try and put together a proper example for you
> tomorrow.

This should work on Biopython 1.43 or later, I have tested it using
the simple FASTA file you gave earlier:

from Bio.Alphabet.IUPAC import IUPACProtein
from Bio.Alphabet import Gapped
from Bio import SeqIO
from Bio.Align import AlignInfo
gapped_protein = Gapped(IUPACProtein())

records = list(SeqIO.parse(open('tmp.fasta'), "fasta"))
for rec in records :
    #Override the default generic alphabet:
    rec.seq.alphabet = gapped_protein
#Turn these records into an alignment
alignment = SeqIO.to_alignment(records, gapped_protein)
del records

summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

The problem with my previous shorter suggestion was the Bio.SeqIO
FASTA parser returned SeqRecord objects with a generic alphabet, while
the alignment summary expected a gapped alphabet.  I'm beginning to
think that the Bio.SeqIO.parse() function should allow an alphabet to
be specified as an optional argument for this sort of situation.

Alternatively, going back to your original code how about:

from Bio.Fasta import FastaAlign
from Bio.Align import AlignInfo

alignment = FastaAlign.parse_file('tmp.fasta',type='PROTEIN')
summary_align = AlignInfo.SummaryInfo(alignment)
print summary_align.dumb_consensus()
print summary_align.gap_consensus()

This works using Biopython 1.44 with either mxTextTools 2.0 or 3.0.
It should work with older versions of Biopython using mxTextTools 2.0
as well.

Peter


------------------------------

Message: 5
Date: Sat, 5 Jan 2008 00:41:25 -0800 (PST)
From: Michiel de Hoon 
Subject: [BioPython] Bio.Ais
To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org
Message-ID: <140129.37367.qm at web62402.mail.re1.yahoo.com>
Content-Type: text/plain; charset=iso-8859-1

Hi everybody,

I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). 
Currently, this example script does not seem to work:

$ python example_ais2.py
Traceback (most recent call last):
  File "example_ais2.py", line 39, in 
    immune = Immune( align, alphabet, 100 )
...
TypeError: 'int' object is not iterable

The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases.

Does anybody know how to fix this example? If not, what should we do with it?

--Michiel.

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


------------------------------

Message: 6
Date: Mon, 7 Jan 2008 19:13:59 +0100
From: Christian Meesters 
Subject: [BioPython] Bio.PDB - adding 'dummy atoms'
To: "biopython at lists.open-bio.org" 
Message-ID: <1199729639.13152.20.camel at meesters.biologie.uni-mainz.de>
Content-Type: text/plain

Hoi,

I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I
have this approach:

new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
residue.add(new)

Here x, y, and z are floating point numbers and serial_number is an
integer. 'residue' is a 'Residue' I'm iterating over. However, I keep
getting the following error message and don't have a clue, how to
proceed:

new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
TypeError: object of type 'module' is not callable

Does anyone have a hint for me, how actually add an atom or what's wrong
here?

TIA
Christian


------------------------------

Message: 7
Date: Mon, 7 Jan 2008 18:55:57 +0000
From: Peter 
Subject: Re: [BioPython] Bio.PDB - adding 'dummy atoms'
To: "Christian Meesters" 
Cc: "biopython at lists.open-bio.org" 
Message-ID:
 <320fb6e00801071055n6bcb936dr58e96ac87b6e509d at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Christian Meesters wrote:
> I'd like to add 'dummy atoms' to a Bio.PDB Structure object. So far, I
> have this approach:
> ...
> new = Atom('OX', array([x, y, z]), 0, 1, 0, " OX ", serial_number)
> TypeError: object of type 'module' is not callable
>
> Does anyone have a hint for me, how actually add an atom or what's wrong
> here?

I would infer from the error that "Atom" refers to the Bio.PDB.Atom
module, rather than the Bio.PDB.Atom.Atom class.  How did you do your
imports?  Try this:

from Bio.PDB.Atom import Atom

Peter


------------------------------

Message: 8
Date: Tue, 8 Jan 2008 10:06:40 +0100
From: Stefanie L?ck 
Subject: [BioPython] blastall does not exist at %s" % blastcmd
To: 
Message-ID: <002301c851d5$c7daac60$1022a8c0 at ipkgatersleben.de>
Content-Type: text/plain; charset="iso-8859-1"

Hi!

I'm trying to get a local blast running. I proceeded as described in the cookbook but I allways get this Error message:
>>>
Traceback (most recent call last):
  File "F:\Blast\blast.py", line 10, in 
    my_blast_db, my_blast_file)
  File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall
    raise ValueError, "blastall does not exist at %s" % blastcmd
ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
<<<

>>>
My Code:

import Bio
from Bio.Blast import NCBIStandalone
import os

my_blast_db = r"F:\Blast\primerdb"
my_blast_file = "test.fasta"
my_blast_exe = r"C:\Blast\bin\blastall.exe"

result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
my_blast_db, my_blast_file)
blast_results = result_handle.read()
save_file = open("my_blast.xml", "w")
save_file.write(blast_results)
save_file.close()
<<<

blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool.

I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

Does somewone has idea where's the problem?

Greetings 
Stefanie


------------------------------

Message: 9
Date: Tue, 8 Jan 2008 10:46:02 +0000
From: Peter 
Subject: Re: [BioPython] blastall does not exist at %s" % blastcmd
To: " Stefanie L?ck " 
Cc: biopython at lists.open-bio.org
Message-ID:
 <320fb6e00801080246t5aa515ccuc8699134b533e8b9 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Jan 8, 2008 9:06 AM, Stefanie L?ck  wrote:
> Hi!
>
> I'm trying to get a local blast running. I proceeded as described in the cookbook
> but I allways get this Error message:
> >>>
> Traceback (most recent call last):
>   File "F:\Blast\blast.py", line 10, in 
>     my_blast_db, my_blast_file)
>   File "C:\Python25\Lib\site-packages\Bio\Blast\NCBIStandalone.py", line 1499, in blastall
>     raise ValueError, "blastall does not exist at %s" % blastcmd
> ValueError: blastall does not exist at C:\Blast\bin\blastall.exe
> <<<
>
> >>>
> My Code:
>
> import Bio
> from Bio.Blast import NCBIStandalone
> import os
>
> my_blast_db = r"F:\Blast\primerdb"
> my_blast_file = "test.fasta"
> my_blast_exe = r"C:\Blast\bin\blastall.exe"
>
> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",
> my_blast_db, my_blast_file)
> ...
> blastall.exe is in this folder (checked by os.listdir()) but can't be found from the tool.
>

Could you try this, which is the test done in the Biopython blastall
function that triggers the error message you saw:

print os.path.exists(my_blast_exe)

Could you also double check the path is C:\Blast\bin\blastall.exe and
not perhaps C:\Blast\blastall.exe (the NCBI changed this at some point
on Windows). Also did you install it to the F: drive where your
database is, rather than C: ?

> I'm using Python 2.5 and biopython-1.44.win32-py2.5.exe.

What version of standalone blast do you have?

Peter


------------------------------

_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


End of BioPython Digest, Vol 61, Issue 2
****************************************


---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


From lee.byung-chul at kaist.ac.kr  Fri Jan 11 03:15:02 2008
From: lee.byung-chul at kaist.ac.kr (Lee,Byung-chul)
Date: Fri, 11 Jan 2008 12:15:02 +0900
Subject: [BioPython] bio.PDB module
In-Reply-To: <258224.6110.qm@web31404.mail.mud.yahoo.com>
References: <258224.6110.qm@web31404.mail.mud.yahoo.com>
Message-ID: <4786DF36.6070102@kaist.ac.kr>


quantrum75 wrote:

> > Hi 
> > I am a biopython newbie. I was wondering if someone could show me or send me ( I would be thankful) where I could find a script which can read a pdb file and out the phi and psi angles of the protein structure.
> > I have read through the bio.PDB module and structural module documentation, but still do not have an idea on how to proceed to tackle the problem. I wish the bio.PDB documentation was a bit more detailed and included some examples to work with. I really would like to contribute to the project and maybe if I got an initial idea on how to work with the same, I can contribute in some small way.
> > Thanks for your time
> > Regards
> > Rama
> >   
>   
I think the web page below can help you. Check it.
:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/

Byung chul.


From mjldehoon at yahoo.com  Fri Jan 11 11:16:45 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 11 Jan 2008 03:16:45 -0800 (PST)
Subject: [BioPython] [Biopython-dev] Bio.Ais
In-Reply-To: <140129.37367.qm@web62402.mail.re1.yahoo.com>
Message-ID: <426295.9925.qm@web62415.mail.re1.yahoo.com>

Looking at this again, currently we have no documentation for Bio.Ais, no maintainer, and no apparent users (at least, I couldn't find any in the mailing list archives). Would anybody mind very much if I mark this module as deprecated?
Just to find out if there are any users of this code out there.

--Michiel.

Michiel de Hoon <mjldehoon at yahoo.com> wrote: Hi everybody,

I was checking which Biopython modules access Entrez/GenBank in any way, and in the process found the script example_ais2.py in Bio/Ais/Examples (this is not related to Entrez/GenBank in any way, it just caught my eye because it imports urllib). 
Currently, this example script does not seem to work:

$ python example_ais2.py
Traceback (most recent call last):
  File "example_ais2.py", line 39, in 
    immune = Immune( align, alphabet, 100 )
...
TypeError: 'int' object is not iterable

The directory Bio/Ais/Examples and its file example_ais2.py only appears in CVS and is not included in Biopython releases.

Does anybody know how to fix this example? If not, what should we do with it?

--Michiel.

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.


From biopython at maubp.freeserve.co.uk  Fri Jan 11 11:51:41 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 11 Jan 2008 11:51:41 +0000
Subject: [BioPython] bio.PDB module
In-Reply-To: <258224.6110.qm@web31404.mail.mud.yahoo.com>
References: <mailman.1663.1199789171.2774.biopython@lists.open-bio.org>
	<258224.6110.qm@web31404.mail.mud.yahoo.com>
Message-ID: <320fb6e00801110351x204102fft44dd3b1e914bfee3@mail.gmail.com>

On Jan 11, 2008 12:16 AM, quantrum75 <quantrum75 at yahoo.com> wrote:
> Hi
> I am a biopython newbie. I was wondering if someone could show me or send me
> ( I would be thankful) where I could find a script which can read a pdb file and out
> the phi and psi angles of the protein structure.

I see Byung chul has already suggested reading this page:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/

Do you think we should incorporate some that into the main Biopython
documentation?

> I have read through the bio.PDB module and structural module documentation,
> but still do not have an idea on how to proceed to tackle the problem. I wish the
> bio.PDB documentation was a bit more detailed and included some examples to
> work with.

Have you read the Biopython Structural Bioinformatics FAQ,
http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf
This is linked to from our documentation webpage, but doesn't seem to
me mentioned in the main Biopython Tutorial and Cookbook...

> I really would like to contribute to the project and maybe if I got an
> initial idea on how to work with the same, I can contribute in some small way.

Maybe you could start a "Getting started with Bio.PDB" page on the Wiki?

Peter


From quantrum75 at yahoo.com  Fri Jan 11 13:26:23 2008
From: quantrum75 at yahoo.com (quantrum75)
Date: Fri, 11 Jan 2008 05:26:23 -0800 (PST)
Subject: [BioPython] bio.PDB module
In-Reply-To: <320fb6e00801110351x204102fft44dd3b1e914bfee3@mail.gmail.com>
Message-ID: <496455.11121.qm@web31409.mail.mud.yahoo.com>

Hi Peter,
  Thanks for your reply. I did go through the links which you made a mention of to me including the structural bioinformatics FAQ.  However, I feel the documentation pertaining to bio.PDB module is seriously short on any practical examples for a person like me who likes to learn from examples.
  I would love to be able to write a "getting started with bio.PDB" wiki or document with examples. However, I need to get the basic ideas on how to use the module which I am unable to from the current documentation which is why I made the request for a script which can compute the phi and psi angle of a pdb file.
  I ll see what I can do and if you could direct to any resources, that would be great.
  Thanks
  Rama

Peter <biopython at maubp.freeserve.co.uk> wrote:
  On Jan 11, 2008 12:16 AM, quantrum75 wrote:
> Hi
> I am a biopython newbie. I was wondering if someone could show me or send me
> ( I would be thankful) where I could find a script which can read a pdb file and out
> the phi and psi angles of the protein structure.

I see Byung chul has already suggested reading this page:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/calculate/

Do you think we should incorporate some that into the main Biopython
documentation?

> I have read through the bio.PDB module and structural module documentation,
> but still do not have an idea on how to proceed to tackle the problem. I wish the
> bio.PDB documentation was a bit more detailed and included some examples to
> work with.

Have you read the Biopython Structural Bioinformatics FAQ,
http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf
This is linked to from our documentation webpage, but doesn't seem to
me mentioned in the main Biopython Tutorial and Cookbook...

> I really would like to contribute to the project and maybe if I got an
> initial idea on how to work with the same, I can contribute in some small way.

Maybe you could start a "Getting started with Bio.PDB" page on the Wiki?

Peter


---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.


From jdieten at gmail.com  Tue Jan 15 13:26:08 2008
From: jdieten at gmail.com (Joost van Dieten)
Date: Tue, 15 Jan 2008 14:26:08 +0100
Subject: [BioPython] [Biopython] Blast problem
Message-ID: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com>

Hi Everyone,

I'am having a problem with the hsp.match function in the Bio-Python Blast
module.
A few weeks ago the hsp.match returned me the following:

ATGGCA++TGG

But now it gives me:

ATGGCA TGG

I can't see the number of gaps anymore, anyone a solution for this?

Best regards,

Joost van Dieten


From biopython at maubp.freeserve.co.uk  Tue Jan 15 14:09:47 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Jan 2008 14:09:47 +0000
Subject: [BioPython] [Biopython] Blast problem
In-Reply-To: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com>
References: <4ac065b80801150526q79215288k7e6a0e633d83f1c4@mail.gmail.com>
Message-ID: <320fb6e00801150609y16c77bdch927dd6d9689996a5@mail.gmail.com>

Hi Joost,

> Iam having a problem with the hsp.match function in the Bio-Python Blast
> module. A few weeks ago the hsp.match returned me the following:
>
> ATGGCA++TGG
>
> But now it gives me:
>
> ATGGCA TGG
>
> I can't see the number of gaps anymore, anyone a solution for this?

Are you using the online version of blast with Biopython?  Perhaps the
NCBI changed something.  Are you parsing the XML output or the plain
text?  Can you provide any more information (e.g. which version of
Biopython).

Thanks

Peter


From luca.beltrame at unimi.it  Thu Jan 17 14:12:55 2008
From: luca.beltrame at unimi.it (Luca Beltrame)
Date: Thu, 17 Jan 2008 15:12:55 +0100
Subject: [BioPython] KEGG Gene parser?
Message-ID: <200801171512.55898.luca.beltrame@unimi.it>

Hello.

I'd like to know if there is a parser that can parse KEGG gene entries. As far 
as I can see, Bio.KEGG can only do Compound and Enzyme. 
Should there be the need I'm thinking about writing one, but since in 2004 
someone had posted something (now no longer available), I'm asking the list 
first.

Thanks.


From lueck at ipk-gatersleben.de  Mon Jan 21 11:21:52 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Mon, 21 Jan 2008 12:21:52 +0100
Subject: [BioPython] blastall questions (output, full length subject)
Message-ID: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de>

Hi!

 
I need again some advice for a local blast with blastall.

 
First of all, everything works fine, I just have some questions on how to continue:

 
1) How can I see the full length of the subject? I always can see only this part, which is matching with the query.

 
2) How are your suggestions to continue with the xml output? I want to sort the Hits by % of matching and my idea was it to put everything in a dictionary (%match as key and all the rest information's as values).

Is this the right way?


Greetings

Stefanie

 
From winter at biotec.tu-dresden.de  Mon Jan 21 13:18:15 2008
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Mon, 21 Jan 2008 14:18:15 +0100
Subject: [BioPython] blastall questions (output, full length subject)
In-Reply-To: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de>
References: <001901c85c1f$d279ac30$1022a8c0@ipkgatersleben.de>
Message-ID: <47949B97.90205@biotec.tu-dresden.de>

Stefanie L?ck wrote:
> Hi!
> 
> I need again some advice for a local blast with blastall.
> 
> First of all, everything works fine, I just have some questions on how to
> continue:
> 
> 1) How can I see the full length of the subject? I always can see only this
> part, which is matching with the query.

Hi Stefanie,

you suffered from the slightly confusing naming in the BioPython NCBIXML class.
Here is an explanation:

alignment.length = total length of unaligned hit sequence
record.query_letters = length of query sequence
len(hsp.query) = len(hsp.match) = len(hsp.sbjct) = length of alignment

with

parser = NCBIXML.BlastParser()
records = parser.parse(open(blast_results_file))

for record in records:
     for alignment in record.alignments:
         for hsp in alignment.hsps:
             # do s.th.

> 2) How are your suggestions to continue with the xml output? I want to sort
> the Hits by % of matching and my idea was it to put everything in a
> dictionary (%match as key and all the rest information's as values).

If you refer to the sequence identity percentage, you can use
sequenceIdentity = int(hsp.identities)*100/int(len(hsp.query))

To use the sequence identity as key in a dictionary, you would have to keep a 
list (or set) of records as value, since different records (hits) can have the 
same sequence identity.

I would recommend to just keep a set (or list) of records, and use the key or 
cmp parameter of Python's sort function to sort by one field of the record:
http://wiki.python.org/moin/HowTo/Sorting

If you only need some information of the record, it might be even easier to 
store this information in a list, and keep a set (or list) of these lists.

HTH,
Christof

PS: Maybe we could enrich NCBIXML.py for some more meaningful variables?

> 
> Is this the right way?
> 
> 
> 
> Greetings
> 
> Stefanie
> 
> 
> 
> _______________________________________________ BioPython mailing list  -
> BioPython at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/biopython


From biopython at maubp.freeserve.co.uk  Mon Jan 21 15:15:45 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Jan 2008 15:15:45 +0000
Subject: [BioPython] KEGG Gene parser?
In-Reply-To: <200801171512.55898.luca.beltrame@unimi.it>
References: <200801171512.55898.luca.beltrame@unimi.it>
Message-ID: <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com>

On Jan 17, 2008 Luca Beltrame wrote:
> Hello.
>
> I'd like to know if there is a parser that can parse KEGG gene entries. As far
> as I can see, Bio.KEGG can only do Compound and Enzyme.

And there is also Bio.KEGG.Map, but you are right, there doesn't seem
to be anything for KEGG gene entries.

> Should there be the need I'm thinking about writing one, but since in 2004
> someone had posted something (now no longer available), I'm asking the list
> first.

It looks no-else is working on any KEGG code, so if you still want to
write something it could be be useful. Are you happy to write this in
a similar style to the existing Bio.KEGG modules, and put together
some basic documentation and a test case too?

Peter


From jkhilmer at gmail.com  Tue Jan 22 19:41:07 2008
From: jkhilmer at gmail.com (Jonathan Hilmer)
Date: Tue, 22 Jan 2008 12:41:07 -0700
Subject: [BioPython] KEGG Gene parser?
In-Reply-To: <320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com>
References: <200801171512.55898.luca.beltrame@unimi.it>
	<320fb6e00801210715n33093e95t40de5f921fe1fd47@mail.gmail.com>
Message-ID: <81277ce10801221141ya4f0d3fr87858102274d6e2e@mail.gmail.com>

Luca,

My lab also has interest in KEGG gene entries.  Although I have
minimal experience in professional Python programming, I would be
happy to help in any way: perhaps testing etc.


Jonathan Hilmer
Bothner Research Group
Montana State University


On Jan 21, 2008 8:15 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Jan 17, 2008 Luca Beltrame wrote:
> > Hello.
> >
> > I'd like to know if there is a parser that can parse KEGG gene entries. As far
> > as I can see, Bio.KEGG can only do Compound and Enzyme.
>
> And there is also Bio.KEGG.Map, but you are right, there doesn't seem
> to be anything for KEGG gene entries.
>
> > Should there be the need I'm thinking about writing one, but since in 2004
> > someone had posted something (now no longer available), I'm asking the list
> > first.
>
> It looks no-else is working on any KEGG code, so if you still want to
> write something it could be be useful. Are you happy to write this in
> a similar style to the existing Bio.KEGG modules, and put together
> some basic documentation and a test case too?
>
> Peter
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From bsantos at biocant.pt  Wed Jan 23 17:55:18 2008
From: bsantos at biocant.pt (Bruno Santos)
Date: Wed, 23 Jan 2008 17:55:18 +0000
Subject: [BioPython] Problems runing BLAST
Message-ID: <20080123175518.eab8a089@mail.biocant.pt>

Hi
I use to run blastall without any problems, but now I have moved all my scripts to a server runing Fedora Core 6 and now I get the folowing error when parsing the blast results:

Traceback (most recent call last):
  File "/usr/local/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py", line 568, in parse
    raise SyntaxError("Your XML file did not start <?xml...")
SyntaxError: Your XML file did not start <?xml...

I'm runing blastall version 2.2.16. 
And my code looks like this:
my_blast_file = "file.fasta"
my_blast_exe = "/usr/local/bin/blastall"
result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file, expectation)  
blast_records = NCBIXML.parse(result_handle)


From sbassi at gmail.com  Wed Jan 23 19:40:25 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Wed, 23 Jan 2008 17:40:25 -0200
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <20080123175518.eab8a089@mail.biocant.pt>
References: <20080123175518.eab8a089@mail.biocant.pt>
Message-ID: <b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>

On Jan 23, 2008 3:55 PM, Bruno Santos <bsantos at biocant.pt> wrote:
>     raise SyntaxError("Your XML file did not start <?xml...")
> SyntaxError: Your XML file did not start <?xml...

Can you show us the result of:
head your_xml_file.xml


From biopython at maubp.freeserve.co.uk  Wed Jan 23 21:07:07 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 23 Jan 2008 21:07:07 +0000
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>
References: <20080123175518.eab8a089@mail.biocant.pt>
	<b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>
Message-ID: <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com>

On 1/23/08, Sebastian Bassi <sbassi at gmail.com> wrote:
> On Jan 23, 2008 3:55 PM, Bruno Santos <bsantos at biocant.pt> wrote:
> >     raise SyntaxError("Your XML file did not start <?xml...")
> > SyntaxError: Your XML file did not start <?xml...
>
> Can you show us the result of:
> head your_xml_file.xml

Seeing the start of the XML file would be very helpful.  And if is
empty, what has been written to the error handle?  I would guess maybe
the database is in a new location or something simple like that...

print error_info.read()

Another thing to check is the version of Biopython on the new machine.
 Earlier versions would default to asking blast for plain text output
instead of XML.

Peter


From bsantos at biocant.pt  Fri Jan 25 12:15:56 2008
From: bsantos at biocant.pt (Bruno Santos)
Date: Fri, 25 Jan 2008 12:15:56 -0000
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com>
References: <20080123175518.eab8a089@mail.biocant.pt>	
	<b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>
	<320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com>
Message-ID: <000301c85f4c$0bd830d0$23889270$@pt>

I wasn't using any XML file as intermediate, I was parsing the blast results
directly. But it was really a problem with the databases. Now it's solved.

My question now is another one, I'm blasting a multifasta file, so I need to
know which results belongs to which query sequence ID. I Know I can simply
assume that the blast result is ordered according to the sequences in the
fasta file, but is any other away to obtain the query ID directly using the
Blast Record class?

Thanks in advance,
Bruno Santos

-----Mensagem original-----
De: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] Em nome de
Peter
Enviada: quarta-feira, 23 de Janeiro de 2008 21:07
Para: Sebastian Bassi
Cc: Bruno Santos; biopython at biopython.org
Assunto: Re: [BioPython] Problems runing BLAST

On 1/23/08, Sebastian Bassi <sbassi at gmail.com> wrote:
> On Jan 23, 2008 3:55 PM, Bruno Santos <bsantos at biocant.pt> wrote:
> >     raise SyntaxError("Your XML file did not start <?xml...")
> > SyntaxError: Your XML file did not start <?xml...
>
> Can you show us the result of:
> head your_xml_file.xml

Seeing the start of the XML file would be very helpful.  And if is
empty, what has been written to the error handle?  I would guess maybe
the database is in a new location or something simple like that...

print error_info.read()

Another thing to check is the version of Biopython on the new machine.
 Earlier versions would default to asking blast for plain text output
instead of XML.

Peter


From winter at biotec.tu-dresden.de  Fri Jan 25 13:02:06 2008
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Fri, 25 Jan 2008 14:02:06 +0100
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <000301c85f4c$0bd830d0$23889270$@pt>
References: <20080123175518.eab8a089@mail.biocant.pt>		<b43bf2080801231140h6202a260h9f1beaed698a7089@mail.gmail.com>	<320fb6e00801231307l5213397ch1c20619b2acc2880@mail.gmail.com>
	<000301c85f4c$0bd830d0$23889270$@pt>
Message-ID: <4799DDCE.1030205@biotec.tu-dresden.de>

Bruno Santos wrote:
> I wasn't using any XML file as intermediate, I was parsing the blast results
> directly. But it was really a problem with the databases. Now it's solved.
> 
> My question now is another one, I'm blasting a multifasta file, so I need to
> know which results belongs to which query sequence ID. I Know I can simply
> assume that the blast result is ordered according to the sequences in the
> fasta file, but is any other away to obtain the query ID directly using the
> Blast Record class?

record.query?

Try exploring your Blast Record instance on a Python shell with the dir function:

 >>> record
<Bio.Blast.Record.Blast instance at 0xb78341cc>
 >>> dir(record)
['__doc__', '__init__', '__module__', '_num_letters_in_database', 'alignments', 
'application', 'blast_cutoff', 'database', 'database_length', 
'database_letters', 'database_name', 'database_sequences', 'date', 
'descriptions', 'dropoff_1st_pass', 'effective_database_length', 
'effective_hsp_length', 'effective_query_length', 'effective_search_space', 
'effective_search_space_used', 'expect', 'filter', 'frameshift', 
'gap_penalties', 'gap_trigger', 'gap_x_dropoff', 'gap_x_dropoff_final', 
'gapped', 'hsps_gapped', 'hsps_no_gap', 'hsps_prelim_gapped', 
'hsps_prelim_gapped_attemped', 'ka_params', 'ka_params_gap', 'matrix', 
'multiple_alignment', 'num_good_extends', 'num_hits', 'num_letters_in_database', 
'num_seqs_better_e', 'num_sequences', 'num_sequences_in_database', 
'posted_date', 'query', 'query_id', 'query_length', 'query_letters', 
'reference', 'sc_match', 'sc_mismatch', 'threshold', 'version', 'window_size']

Cheers,
Christof

> 
> Thanks in advance,
> Bruno Santos


From mjldehoon at yahoo.com  Fri Jan 25 13:04:38 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 25 Jan 2008 05:04:38 -0800 (PST)
Subject: [BioPython] Bio.EUtils
Message-ID: <8786.65209.qm@web62404.mail.re1.yahoo.com>

Hello everybody,

I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering:
1) Is anybody using Bio.EUtils?
2) If so, could you give an example script that uses Bio.EUtils?
So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others.

Thanks!

--Michiel.


---------------------------------
Never miss a thing.   Make Yahoo your homepage.


From mjldehoon at yahoo.com  Sat Jan 26 05:38:01 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 25 Jan 2008 21:38:01 -0800 (PST)
Subject: [BioPython] Bio.EUtils
In-Reply-To: <d9fd76050801251103lb68260sd0e7d759cdf6b5e5@mail.gmail.com>
Message-ID: <367303.23759.qm@web62406.mail.re1.yahoo.com>

Dear Rohini,
 
 Thank you for your example. It was very helpful.
 Just a few questions about it:
 
 > dbinfo = EUtils.databases['pubmed']
 Is this statement needed? The variable dbinfo is not used in your example, and the example words fine without this statement.
 
 > Then parse the xml or text lines.
  Do you parse the xml or text output yourself, or do you use any Biopython tools for that?
 
 The following does almost the same with Bio.WWW.NCBI instead of Bio.EUtils:
 >>> from Bio.WWW import NCBI
 >>> lines = NCBI.efetch(db='pubmed', id=listids, retmode='xml' ).readlines()
 # or retmode='text'
 I am saying "almost" the same, because currently Bio.WWW.NCBI.efetch does not handle multiple listids (so it accepts listids = '18211820' but not listids = ['18211820', '18211718', '18178374']). However, this can be fixed very easily in Biopython.
 My last question is: Is this sufficient for your needs? Or do you see some advantage to using Bio.EUtils over Bio.WWW.NCBI?
 
 Thanks again,
 
 --Michiel.
 
 
Rohini Damle <rohini.damle at gmail.com> wrote: Hi,
 Here is how I use Bio.Eutils:
  
 from Bio import EUtils
from Bio.EUtils import DBIdsClient
  
 dbinfo = EUtils.databases['pubmed']
 #listids is a list of pubmed ids
 record = DBIdsClient.from_dbids(EUtils.DBIds("pubmed",listids))
 rec2= record.efetch(retmode="xml",rettype=None).readlines()
 # or rec2= record.efetch(retmode="text", rettype="abstract").readlines() if you want to parse the abstract in text format
 Then parse the xml or text lines.
  
 Thanks
 -Rohini.
 

 On Jan 25, 2008 5:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
 Hello everybody,

I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering:
 1) Is anybody using Bio.EUtils?
2) If so, could you give an example script that uses Bio.EUtils?
So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others.

Thanks!

 --Michiel.
 

---------------------------------
Never miss a thing.   Make Yahoo your homepage.
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
 http://lists.open-bio.org/mailman/listinfo/biopython


Rohini Damle <rohini.damle at gmail.com> wrote: Hi,
 Here is how I use Bio.Eutils:
  
 from Bio import EUtils
from Bio.EUtils import DBIdsClient
  
 dbinfo = EUtils.databases['pubmed']
 #listids is a list of pubmed ids
 record = DBIdsClient.from_dbids(EUtils.DBIds("pubmed",listids))
 rec2= record.efetch(retmode="xml",rettype=None).readlines()
 # or rec2= record.efetch(retmode="text", rettype="abstract").readlines() if you want to parse the abstract in text format
 Then parse the xml or text lines.
  
 Thanks
 -Rohini.
 

 On Jan 25, 2008 5:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
 Hello everybody,

I am looking at the various ways Biopython interacts with NCBI's Entrez search engine, and if possible to organize and document this a bit more. Currently there are several modules that interact with Entrez. The most extensive one is Bio.EUtils, but there are also simpler modules such as Bio.WWW.NCBI. I was wondering:
 1) Is anybody using Bio.EUtils?
2) If so, could you give an example script that uses Bio.EUtils?
So we can get an idea of the amount of overlap between Bio.EUtils and Bio.WWW.NCBI and others.

Thanks!

 --Michiel.
 

---------------------------------
Never miss a thing.   Make Yahoo your homepage.
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
 http://lists.open-bio.org/mailman/listinfo/biopython


---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.


From rjalves at igc.gulbenkian.pt  Mon Jan 28 09:58:50 2008
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Mon, 28 Jan 2008 09:58:50 +0000
Subject: [BioPython] Translation issues
Message-ID: <479DA75A.6070804@igc.gulbenkian.pt>

Hi.

I'm trying to automate and validate the process of translation in 
sequences downloaded from NCBI.

Basically I fetch a GenBank file, extract the DNA sequences and use the 
Translation module of BioPython to check if it matches. The problem is 
that the starting aminoacid in NCBI is always M but with the Translation 
module isn't, even if the codon is marked as "starting" in the 
corresponding codon table.

So for instance, the sequence :

"TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA
ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC
GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA
ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA
CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG
AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA
ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA
AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA"

with codon table 11 will translate to:

a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"

while the translation on the GenBank file is:

b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"

causing the test a == b to fail. The sequences are exactly the same with 
the exception of the initial aminoacid

I could do the test in other ways and remove the initial letter, but 
that wouldn't work globally.

So, is this the right behavior or am I missing something?

Any other suggestions to do this test will also help.

Thanks
--
Renato Alves


From biopython at maubp.freeserve.co.uk  Mon Jan 28 10:40:28 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 28 Jan 2008 10:40:28 +0000
Subject: [BioPython] Translation issues
In-Reply-To: <479DA75A.6070804@igc.gulbenkian.pt>
References: <479DA75A.6070804@igc.gulbenkian.pt>
Message-ID: <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com>

On 1/28/08, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
> Hi.
>
> I'm trying to automate and validate the process of translation in
> sequences downloaded from NCBI.  ...  The problem is
> that the starting aminoacid in NCBI is always M but with the Translation
> module isn't, even if the codon is marked as "starting" in the
> corresponding codon table.
>
> So, is this the right behavior or am I missing something?

Sadly, that is the just way the translation module works.  This is a
fairly common problem, and its one I was planning to try and "fix" as
part of Bug 2382
http://bugzilla.open-bio.org/show_bug.cgi?id=2381

I would like some comments on the ideas on that bug - for example
would you prefer separate methods/functions for blind translation,
translation until a stop codon, and translation from a start codon
which is treated as an M - or a single method with lots of optional
arguments?

> Any other suggestions to do this test will also help.

Right now, I would check the start codon yourself and then use an M
when translating the sequence.  Remember the codon table (table 11 in
your example) should have all the valid start codons defined.

Peter


From bsouthey at gmail.com  Mon Jan 28 14:42:22 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 28 Jan 2008 08:42:22 -0600
Subject: [BioPython] Translation issues
In-Reply-To: <479DA75A.6070804@igc.gulbenkian.pt>
References: <479DA75A.6070804@igc.gulbenkian.pt>
Message-ID: <bbcd77d00801280642t72d7126btdf4f105ea11e6e55@mail.gmail.com>

Hi,
Please see:
http://en.wikipedia.org/wiki/Start_codon
"In addition to AUG, alternative start codons, mainly GUG and UUG are
used in prokaryotes. For example E. coli uses 77% ATG (AUG), 14% GTG
(GUG), 8% TTG (UUG) and a few others."

Really the only way is to compare the sequences after the first
position (a[1:]==b[1:]) assuming you expect an exact match.
Alternatively you need to perform some type of alignment and flag
unexpected differences.

Regards
Bruce

On Jan 28, 2008 3:58 AM, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
> Hi.
>
> I'm trying to automate and validate the process of translation in
> sequences downloaded from NCBI.
>
> Basically I fetch a GenBank file, extract the DNA sequences and use the
> Translation module of BioPython to check if it matches. The problem is
> that the starting aminoacid in NCBI is always M but with the Translation
> module isn't, even if the codon is marked as "starting" in the
> corresponding codon table.
>
> So for instance, the sequence :
>
> "TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA
> ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC
> GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA
> ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA
> CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG
> AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA
> ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA
> AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA"
>
> with codon table 11 will translate to:
>
> a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
> LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
> FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"
>
> while the translation on the GenBank file is:
>
> b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
> LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
> FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"
>
> causing the test a == b to fail. The sequences are exactly the same with
> the exception of the initial aminoacid
>
> I could do the test in other ways and remove the initial letter, but
> that wouldn't work globally.
>
> So, is this the right behavior or am I missing something?
>
> Any other suggestions to do this test will also help.
>
> Thanks
> --
> Renato Alves
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From rjalves at igc.gulbenkian.pt  Mon Jan 28 15:37:57 2008
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Mon, 28 Jan 2008 15:37:57 +0000
Subject: [BioPython] Translation issues
In-Reply-To: <320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com>
References: <479DA75A.6070804@igc.gulbenkian.pt>
	<320fb6e00801280240q785d7850g2b48016c7eefd90d@mail.gmail.com>
Message-ID: <479DF6D5.7020709@igc.gulbenkian.pt>

Peter wrote:
> Sadly, that is the just way the translation module works.  This is a
> fairly common problem, and its one I was planning to try and "fix" as
> part of Bug 2382
> http://bugzilla.open-bio.org/show_bug.cgi?id=2381
>   
In this case, I guess that something that tests if the 1st codon is a 
start codon and matches the codon table's start codons, would be 
replaced by "M". But this is a very naive and specific thing. I don't 
know if this could break other uses of this function.
> I would like some comments on the ideas on that bug - for example
> would you prefer separate methods/functions for blind translation,
> translation until a stop codon, and translation from a start codon
> which is treated as an M - or a single method with lots of optional
> arguments?
>   
I don't have the expertise to distinguish the pros and cons between the 
two approaches.
Still, in terms of potential user friendliness, I would go for separate 
methods/functions to keep the task simple and obvious.
> Right now, I would check the start codon yourself and then use an M
> when translating the sequence.  Remember the codon table (table 11 in
> your example) should have all the valid start codons defined.
>   
I'm adopting the technique suggested by Bruce Southey to workaround this 
particular problem. Still this wouldn't work on more elaborate cases 
like some of the ones described on the bug thread you mentioned.

Still, many thanks for the quick and clean answers.

Renato


From rjalves at igc.gulbenkian.pt  Mon Jan 28 17:42:05 2008
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Mon, 28 Jan 2008 17:42:05 +0000
Subject: [BioPython] Alphabet Checking
Message-ID: <479E13ED.2080908@igc.gulbenkian.pt>

/var/lib/python-support/python2.4/Bio/Translate.py in 
translate_to_stop(self, seq)
     34     def translate_to_stop(self, seq):
     35         # This doesn't have a stop encoding
---> 36         assert seq.alphabet == self.table.nucleotide_alphabet, \
     37                "cannot translate from given alphabet (have %s, 
need %s)" %\
     38                (seq.alphabet, self.table.nucleotide_alphabet)

AssertionError: cannot translate from given alphabet (have 
IUPACAmbiguousDNA(), need IUPACAmbiguousDNA())

Aren't those two exactly equal?

Matching references doesn't seem to work as expected :(

What I did:

from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA
from Bio import Translate
from Bio import Seq

a=Seq.Seq("ATCGGATGA...ATGCAGT",alphabet=IUPACAmbiguousDNA())
b=Translate.ambiguous_dna_by_id[11]

b.translate_to_stop(a) ... error pops out

The only way around I was able to find is:

b.table.nucleotide_alphabet=a.alphabet

I guess this is a bad day :( it's the second clash with the Translate 
module in the same day :|

Should I report this as bug?


From p.j.a.cock at googlemail.com  Mon Jan 28 17:56:11 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Jan 2008 17:56:11 +0000
Subject: [BioPython] Alphabet Checking
In-Reply-To: <479E13ED.2080908@igc.gulbenkian.pt>
References: <479E13ED.2080908@igc.gulbenkian.pt>
Message-ID: <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com>

> Aren't those two exactly equal?
>
> Matching references doesn't seem to work as expected :(

That does look like a bug...

> The only way around I was able to find is:

Another option,

from Bio import Translate
from Bio import Seq

trans=Translate.ambiguous_dna_by_id[11]
a=Seq.Seq("ATCGGATGAATGCAGT",alphabet=trans.table.nucleotide_alphabet)
print trans.translate_to_stop(a)
print trans.translate(a)

> I guess this is a bad day :( it's the second clash with the Translate
> module in the same day :|

I don't like the Bio.Translate module either.

> Should I report this as bug?

Please do.  If we do just add translation to the seq object (bug 2381)
and deprecate the Bio.Translate module then in a sense this problem
goes away ;)

Peter


From tiagoantao at gmail.com  Mon Jan 28 18:10:56 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 28 Jan 2008 18:10:56 +0000
Subject: [BioPython] Alphabet Checking
In-Reply-To: <320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com>
References: <479E13ED.2080908@igc.gulbenkian.pt>
	<320fb6e00801280956m4dec2c1eu79c89396e8a4f72f@mail.gmail.com>
Message-ID: <6d941f120801281010r6e8e829dub26a85e6a0b61983@mail.gmail.com>

On Jan 28, 2008 5:56 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > Aren't those two exactly equal?
> >
> > Matching references doesn't seem to work as expected :(
>
> That does look like a bug...

It is probably completely unrelated, but it might not...

>From an "helicopter view" at the code I have noticed that SeqIO uses
Nexus in some cases.

I have patched a previous Nexus bug by using deepcopy, which could
cause something like this:

AssertionError: cannot translate from given alphabet (have
IUPACAmbiguousDNA(), need IUPACAmbiguousDNA())

(ie, it has the same type name, but is really not the same object)

Again, it is probably unrelated (I know very little about Bio.Seq and
Bio.SeqIO), but, just in case...


From mjldehoon at yahoo.com  Tue Jan 29 00:43:22 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 28 Jan 2008 16:43:22 -0800 (PST)
Subject: [BioPython] Bio.EUtils
In-Reply-To: <d9fd76050801281054q1ba7a498yc54f2ca24d3f8d5d@mail.gmail.com>
Message-ID: <356164.74184.qm@web62403.mail.re1.yahoo.com>


Rohini Damle <rohini.damle at gmail.com> wrote: The following does almost the same with Bio.WWW.NCBI instead of Bio.EUtils:
...
My last question is: Is this sufficient for your needs? Or do you see some advantage to using Bio.EUtils over Bio.WWW.NCBI?  I guess Bio.EUtils is faster, can be used for batch-processing (like fetching records for a list of pubmed ids) . I have not tried Bio.WWW.NCBI , will try it and get back to you.

If you make the following modification in Bio.WWW.NCBI.py:
line 189: replace 
    options = urllib.urlencode(params)
by
    options = urllib.urlencode(params, doseq=1)
then Bio.WWW.NCBI can also fetch records for a list of pubmed ids. I'm guessing that then it is as fast as (or faster than) Bio.EUtils, but I'd be interested in what you find in practice.

Thanks,

--Michiel


---------------------------------
Never miss a thing.   Make Yahoo your homepage.


From bsantos at biocant.pt  Tue Jan 29 11:34:07 2008
From: bsantos at biocant.pt (Bruno Santos)
Date: Tue, 29 Jan 2008 11:34:07 -0000
Subject: [BioPython] Problems runing BLAST
Message-ID: <000101c8626a$dd98e760$98cab620$@pt>

I am once more having problems running blast using biopython. I start the
script the blastall process starts and after a few minutes it starts
sleeping and no error message is passed. When I check the xml file it only
writes part of the results for the first sequence. 
Does anyone has ever had the same problem?

I'm using:
python 2.5.1
biopython 1.44
blastall 2.2.16

My code is the following:

from Bio import SeqIO
from Bio.Blast import NCBIStandalone
from Bio.Blast import NCBIXML
import time
import math
import time
import os


primer = 'D2'
sample = 'AGC'

#Defines all the databases that will be used
my_blast_db = ('\"/home/bsantos/DataBases/nt.00
/home/bsantos/DataBases/nt.01 /home/bsantos/DataBases/nt.02
/home/bsantos/DataBases/nt.03 /home/bsantos/DataBases/nt.04
/home/bsantos/DataBases/nt.05 /home/bsantos/DataBases/RDPIIdb
/home/bsantos/DataBases/RNADB\"')
print my_blast_db


#Define the fasta file to Blast
destination = '/home/bsantos/Metagenomics/Results/' + sample + '/' + primer
+ '/filteredfile_sample' + sample + '_' + primer + '_F.fasta'
my_blast_file = (destination)

#Defines the blast binaries
my_blast_exe = "/usr/local/bin/blastall"

print (os.path.exists(my_blast_exe))
print time.ctime()

#Performs Blast
print 'Now Performing Blast'
result_handle, error_info = NCBIStandalone.blastall(my_blast_exe,
"blastn",my_blast_db, my_blast_file)
print 'This errors have occured:'
print error_info.read()
print 'Starting parsing the results.......'
#Parse the result of the blast in XML format
blast_results = result_handle.read() #Catch the results
save_file = open('/home/bsantos/Metagenomics/Results/' + sample + '/' +
primer + '/BlastReport_sample' + sample + '_' + primer + '_F.xml', 'w')
save_file.write(blast_results) #Write all the information to an XML file
save_file.flush()
save_file.close()


From biopython at maubp.freeserve.co.uk  Tue Jan 29 12:15:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 29 Jan 2008 12:15:26 +0000
Subject: [BioPython] Problems runing BLAST
In-Reply-To: <000101c8626a$dd98e760$98cab620$@pt>
References: <000101c8626a$dd98e760$98cab620$@pt>
Message-ID: <320fb6e00801290415g10e099dj108ecea15a72109c@mail.gmail.com>

Hi Bruno,

On Jan 29, 2008 11:34 AM, Bruno Santos <bsantos at biocant.pt> wrote:
> I am once more having problems running blast using biopython. I start the
> script the blastall process starts and after a few minutes it starts
> sleeping and no error message is passed. When I check the xml file it only
> writes part of the results for the first sequence.

Have you tried running the same command "by hand" at the command line,
to check that is works, and time how long you should expect it to
take?

> Does anyone has ever had the same problem?

I think the problem is to do with asking the operating system to read
all the error output.  Try commenting out this bit, and only read the
error handle if you have a problem:

# print error_info.read()

Quoting from the tutorial,

>> The error info can be hard to deal with, because if you try to do a
error_handle.read() and
>> there was no error info returned, then the read() call will block
and not return, locking your
>> script. In my opinion, the best way to deal with the error is only
to print it out if you are not
>> getting result_handle results to be parsed, but otherwise to leave it alone.

Peter


From jblanca at btc.upv.es  Wed Jan 30 09:15:49 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Wed, 30 Jan 2008 10:15:49 +0100
Subject: [BioPython] blast parse
Message-ID: <200801301015.50812.jblanca@btc.upv.es>

Hi:
I'm new on the list and on biopython. I come from perl and I'm liking python a 
lot.
I'm trying to read a big blast file and it takes a lot o time and memory. I'm 
not sure if I'm taking the most efficient path. Basically I'm doing:

blasth = file('blast.xml', 'r')
from Bio.Blast import NCBIXML
p = NCBIXML.BlastParser()
blast_parse = p.parse(blasth)
for blast_result in blast_parse:
	#do whatever

I was expecting to read the records one by one, but the call to 
p.parse(blasth) takes a lot of time and memory. I'm not sure about what this 
function returns, a list or an iterator. I've looked at the NCBIXML.py file 
and the BlastParser class has two parse methods (am I wrong?).

    def parse(self, handler):
        """Parses the XML data

        handler -- file handler or StringIO

        This method returns a list of Blast record objects.
        """

def parse(handle, debug=0):
    """Returns an iterator a Blast record for each query.

    handle - file handle to and XML file to parse
    debug - integer, amount of debug information to print

    This is a generator function that returns multiple Blast records
    objects - one for each query sequence given to blast.  The file
    is read incrementally, returning complete records as they are read
    in.

I guess that the first function would read the complete file before returning 
anything, but the second should return and read the records one by one. I 
don't know if this guess is correct.
Is there other way to read these huge blast files without using so much 
memory?
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From mjldehoon at yahoo.com  Wed Jan 30 09:56:56 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 30 Jan 2008 01:56:56 -0800 (PST)
Subject: [BioPython] blast parse
In-Reply-To: <200801301015.50812.jblanca@btc.upv.es>
Message-ID: <940738.9737.qm@web62407.mail.re1.yahoo.com>

Dear Jose,

To get the records one-by-one, use

from Bio.Blast import NCBIXML
blast_parse = NCBIXML.parse(blasth)
for blast_result in blast_parse:
    # do whatever with blast_result

This avoids having to read the complete XML file all at once.

To the developers:
We should probably think about removing the NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read exactly one record from the XML file.

--Michiel.

Jose Blanca <jblanca at btc.upv.es> wrote: Hi:
I'm new on the list and on biopython. I come from perl and I'm liking python a 
lot.
I'm trying to read a big blast file and it takes a lot o time and memory. I'm 
not sure if I'm taking the most efficient path. Basically I'm doing:

blasth = file('blast.xml', 'r')
from Bio.Blast import NCBIXML
p = NCBIXML.BlastParser()
blast_parse = p.parse(blasth)
for blast_result in blast_parse:
 #do whatever

I was expecting to read the records one by one, but the call to 
p.parse(blasth) takes a lot of time and memory. I'm not sure about what this 
function returns, a list or an iterator. I've looked at the NCBIXML.py file 
and the BlastParser class has two parse methods (am I wrong?).

    def parse(self, handler):
        """Parses the XML data

        handler -- file handler or StringIO

        This method returns a list of Blast record objects.
        """

def parse(handle, debug=0):
    """Returns an iterator a Blast record for each query.

    handle - file handle to and XML file to parse
    debug - integer, amount of debug information to print

    This is a generator function that returns multiple Blast records
    objects - one for each query sequence given to blast.  The file
    is read incrementally, returning complete records as they are read
    in.

I guess that the first function would read the complete file before returning 
anything, but the second should return and read the records one by one. I 
don't know if this guess is correct.
Is there other way to read these huge blast files without using so much 
memory?
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


---------------------------------
Never miss a thing.   Make Yahoo your homepage.


From lueck at ipk-gatersleben.de  Wed Jan 30 10:24:55 2008
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Wed, 30 Jan 2008 11:24:55 +0100
Subject: [BioPython] Clustalw pair wise alignment
Message-ID: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de>

Hi!

 
I working with clustalw and everything works fine. No I have some questions:

 
1) Must the input data be in a file or can it also be in the code (e.g. in a list)?

 
2) Because, I want to do many (up to hundreds) pair wise alignments (short sequences) and I don't want to store each of them in a separate file. 

If I have it in one file, clustalw make a multiple alignment:

 
Match1                              ------CAAGATTTGAGCACCACAGGCAA---
full1                                  ------CAAGATTTGAGCACCACAGGCAACAG
Match0                              AGCCTTCAAGATTTGAGCACCACAG-------
full0                                   AGCCTTCAAGATTTGAGCACCACAG-------

 
whereas Match1 should only align to full1 and so on.

 
Could someone give a hint?


Regards 

Stefanie

 
From biopython at maubp.freeserve.co.uk  Wed Jan 30 11:47:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 30 Jan 2008 11:47:42 +0000
Subject: [BioPython] Clustalw pair wise alignment
In-Reply-To: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de>
References: <000d01c8632a$5bcbac70$1022a8c0@ipkgatersleben.de>
Message-ID: <320fb6e00801300347h6f1ec197qc599ec9f2c80bab@mail.gmail.com>

Hi Stefanie

> I working with clustalw and everything works fine. No I have some questions:
>
> 1) Must the input data be in a file or can it also be in the code (e.g. in a list)?

I believe for the Clustalw command line tool, you have to supply the
input data in a file.

> 2) Because, I want to do many (up to hundreds) pair wise
> alignments (short sequences) and I don't want to store
> each of them in a separate file.
>
> If I have it in one file, clustalw make a multiple alignment:

Yes, that is expected for clustalw.

> Could someone give a hint?

If you want to use Clustalw, you could re-use a temporary file for
each pair of sequences (rather than creating hundreds of different
input files).

I would consider using the EMBOSS tools "needle" or "water" for doing
pairwise alignments.  These have the advantage that you can actually
supply the sequence as part of the command line (provided they are not
too long).  See http://emboss.sourceforge.net/apps/ and also
http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html#asis

Peter


From winter at biotec.tu-dresden.de  Wed Jan 30 12:48:34 2008
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Wed, 30 Jan 2008 13:48:34 +0100
Subject: [BioPython] blast parse
In-Reply-To: <940738.9737.qm@web62407.mail.re1.yahoo.com>
References: <940738.9737.qm@web62407.mail.re1.yahoo.com>
Message-ID: <47A07222.9000200@biotec.tu-dresden.de>

Michiel de Hoon wrote:
> Dear Jose,
> 
> To get the records one-by-one, use
> 
> from Bio.Blast import NCBIXML blast_parse = NCBIXML.parse(blasth) for
> blast_result in blast_parse: # do whatever with blast_result
> 
> This avoids having to read the complete XML file all at once.
> 
> To the developers: We should probably think about removing the
> NCBIXML.BlastParser.parse, and perhaps adding a NCBIXML.read function to read
> exactly one record from the XML file.

I thinks removing NCBIXML.BlastParser.parse is a good idea.
We should keep it simple.

Christof