From arareko at campus.iztacala.unam.mx  Sat Mar  3 17:32:46 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 03 Mar 2007 16:32:46 -0600
Subject: [BioPython] [Bioperl-l] New Article on Approaches to Web
 Development for Bioinformatics
In-Reply-To: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com>
References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com>
Message-ID: <45E9F78E.8040406@campus.iztacala.unam.mx>

Hi Alex,

I think you've put a very nice & concise introductory article. I'd like 
to comment a little on some sections I've read:

* Introduction

 > "Given that you have an idea for analyzing or presenting data in a
 > particular was, a complete bioinformatics web application depends of
 > these basic pieces, which is what this article is all about:
 >
 >    1. A source of data...
 >    2. An application programming language...
 >    3. A web application platform...
 >    4. Optionally, a data store...
 >    5. Optionally, you would reuse software tools..."

Even though you do a small mention about Web Services at the very end of 
the article (under Application Integration -> Programmatic Integration), 
I believe that Web Services can be another optional (or even basic) 
piece of a web application. In fact, many web applications consist only 
of Web Services without HTML user interfaces.

* Application Development Languages

 > "There are many different programming platforms and tools available to
 > solve bioinformatics problems.  It can be bewildering at first, but it
 > makes more sense to build on top of some of these tools rather than
 > build from scratch.  Some the problems with using these tools for a
 > bioinformatics portal are
 >
 >    1. Many tools are written...
 >    2. Some tools have particular prerequisites...
 >    3. Many may not be in a form...
 >    4. The context that gives meaning...
 >
 > Standardization on a particular platform can help manageability but
 > for most organizations a compromise between standardization and
 > adoption of several different platforms will allow many people to
 > develop software in platforms that they are already comfortable with
 > and allow the reuse of a large amount of freely available software..."

I would add to the problems list the fact that building web (or other 
kind of) applications on top of a platform whose codebase is evolving 
constantly, can make them very difficult to maintain. The case of 
EnsEMBL comes to my mind here: they opted to stick with BioPerl 1.2.3 as 
a core library and haven't moved onto a higher version of it because the 
EnsEMBL code is so vast, that a simple upgrade of BioPerl would break a 
lot of their code. AFAIK, it's because of this and the slowness at some 
parts of BioPerl that EnsEMBL is gradually saying goodbye to BioPerl.

Also, I think that depending on the amount of available code you plan to 
import into your application, sometimes having a whole platform at the 
very bottom can add unnecessary extra weight to your application. More 
weight could be equal to less speed, this is critical in web development.

* Application Integration -> Navigation

 > "The basic way that users will navigate into and around your
 > application should be using HTTP GET and POST requests with specific
 > URL's. Users bookmark these URL's and other applications will link to
 > them. Most applications developers did not realize it at first, but
 > these URL's are, in fact, an interface into your application that you
 > must maintain in a consistent way as you change and evolve your
 > software. Otherwise, they will find dead links..."

Just as I clicked the bookmark button for your article :) The same 
principle could apply to its filenames. A URL of the form: 
http://medicalcomputing.net/tools_dna17.php is less indicative of the 
real content of the article and can mislead potential readers. 
Optimising the URL's will make them better to be indexed by search 
engines, something like: 
http://medicalcomputing.net/web-development-bioinformatics17.php would 
do the trick.

To conclude my comments, I was surprised to see a section about BioPHP 
and not about other more-known toolkits like BioPython or BioRuby. What 
about their role in web development? Python is also a common language 
for web programming and with all the recent *hot* stuff like Ruby On 
Rails, it's very likely that both Bio* toolkits are more than ready for 
deploying web applications. I'm Cc'ing this to their respective mailing 
lists to see if someone wants to give you some feedback about them in 
order to complement your article. Other than that, I really liked your 
work :)

Cheers,
Mauricio.

Alex Amies wrote:
> I have written an article on Approaches to Web Development for
> Bioinformatics at
> 
> http://medicalcomputing.net/tools_dna1.php
> 
> There is a fairly large section on BioPerl at
> 
> http://medicalcomputing.net/tools_dna13.php
> 
> I hope that someone gets something useful out of it.  I also looking for
> feedback on it and, in particular, please let me know about any mistakes in
> it.
> 
> The intent of the article is to give an overview of various approaches to
> developing web based tools for bioinformatics. It describes the alternatives
> at each layer of the system, including the data layer and sources of data,
> the application programming layer, the web layer, and bioinformatics tools
> and software libraries.
> 
> Alex
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From alexamies at gmail.com  Sat Mar  3 22:09:51 2007
From: alexamies at gmail.com (Alex Amies)
Date: Sat, 3 Mar 2007 19:09:51 -0800
Subject: [BioPython] [Bioperl-l] New Article on Approaches to Web
	Development for Bioinformatics
In-Reply-To: <45E9F78E.8040406@campus.iztacala.unam.mx>
References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com>
	<45E9F78E.8040406@campus.iztacala.unam.mx>
Message-ID: <1ad8057e0703031909v4880f5f1t3c4159b75c36bcca@mail.gmail.com>

Mauricio,

Thanks for your comments.

You are right that I could have said a lot more about web services.  I
plan on doing that but I haven't got there yet.  Actually, with all
the hype about web services I have been surprised to find the
programming model so complicated.

As you mention, I certainly could have thought out my own URL's better.

I have been surprised not to find more PHP activity in bioinformatics.
 To me, besides being a lightweight and pleasant language to program
in it is incredibly economical for hosting Internet applications and
there is a huge open source community around PHP in general.  The same
can be said of Perl.  It is because of my own ignorance and lack of
time that I have not investigated Python and Ruby.  I may do in the
future and write about them.

Alex

On 3/3/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> Hi Alex,
>
> I think you've put a very nice & concise introductory article. I'd like
> to comment a little on some sections I've read:
>
> * Introduction
>
>  > "Given that you have an idea for analyzing or presenting data in a
>  > particular was, a complete bioinformatics web application depends of
>  > these basic pieces, which is what this article is all about:
>  >
>  >    1. A source of data...
>  >    2. An application programming language...
>  >    3. A web application platform...
>  >    4. Optionally, a data store...
>  >    5. Optionally, you would reuse software tools..."
>
> Even though you do a small mention about Web Services at the very end of
> the article (under Application Integration -> Programmatic Integration),
> I believe that Web Services can be another optional (or even basic)
> piece of a web application. In fact, many web applications consist only
> of Web Services without HTML user interfaces.
>
> * Application Development Languages
>
>  > "There are many different programming platforms and tools available to
>  > solve bioinformatics problems.  It can be bewildering at first, but it
>  > makes more sense to build on top of some of these tools rather than
>  > build from scratch.  Some the problems with using these tools for a
>  > bioinformatics portal are
>  >
>  >    1. Many tools are written...
>  >    2. Some tools have particular prerequisites...
>  >    3. Many may not be in a form...
>  >    4. The context that gives meaning...
>  >
>  > Standardization on a particular platform can help manageability but
>  > for most organizations a compromise between standardization and
>  > adoption of several different platforms will allow many people to
>  > develop software in platforms that they are already comfortable with
>  > and allow the reuse of a large amount of freely available software..."
>
> I would add to the problems list the fact that building web (or other
> kind of) applications on top of a platform whose codebase is evolving
> constantly, can make them very difficult to maintain. The case of
> EnsEMBL comes to my mind here: they opted to stick with BioPerl 1.2.3 as
> a core library and haven't moved onto a higher version of it because the
> EnsEMBL code is so vast, that a simple upgrade of BioPerl would break a
> lot of their code. AFAIK, it's because of this and the slowness at some
> parts of BioPerl that EnsEMBL is gradually saying goodbye to BioPerl.
>
> Also, I think that depending on the amount of available code you plan to
> import into your application, sometimes having a whole platform at the
> very bottom can add unnecessary extra weight to your application. More
> weight could be equal to less speed, this is critical in web development.
>
> * Application Integration -> Navigation
>
>  > "The basic way that users will navigate into and around your
>  > application should be using HTTP GET and POST requests with specific
>  > URL's. Users bookmark these URL's and other applications will link to
>  > them. Most applications developers did not realize it at first, but
>  > these URL's are, in fact, an interface into your application that you
>  > must maintain in a consistent way as you change and evolve your
>  > software. Otherwise, they will find dead links..."
>
> Just as I clicked the bookmark button for your article :) The same
> principle could apply to its filenames. A URL of the form:
> http://medicalcomputing.net/tools_dna17.php is less indicative of the
> real content of the article and can mislead potential readers.
> Optimising the URL's will make them better to be indexed by search
> engines, something like:
> http://medicalcomputing.net/web-development-bioinformatics17.php would
> do the trick.
>
> To conclude my comments, I was surprised to see a section about BioPHP
> and not about other more-known toolkits like BioPython or BioRuby. What
> about their role in web development? Python is also a common language
> for web programming and with all the recent *hot* stuff like Ruby On
> Rails, it's very likely that both Bio* toolkits are more than ready for
> deploying web applications. I'm Cc'ing this to their respective mailing
> lists to see if someone wants to give you some feedback about them in
> order to complement your article. Other than that, I really liked your
> work :)
>
> Cheers,
> Mauricio.
>
> Alex Amies wrote:
> > I have written an article on Approaches to Web Development for
> > Bioinformatics at
> >
> > http://medicalcomputing.net/tools_dna1.php
> >
> > There is a fairly large section on BioPerl at
> >
> > http://medicalcomputing.net/tools_dna13.php
> >
> > I hope that someone gets something useful out of it.  I also looking for
> > feedback on it and, in particular, please let me know about any mistakes in
> > it.
> >
> > The intent of the article is to give an overview of various approaches to
> > developing web based tools for bioinformatics. It describes the alternatives
> > at each layer of the system, including the data layer and sources of data,
> > the application programming layer, the web layer, and bioinformatics tools
> > and software libraries.
> >
> > Alex
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
>
>


From shahs at MIT.EDU  Sat Mar  3 23:14:59 2007
From: shahs at MIT.EDU (Hossein Shahsavari)
Date: Sat, 03 Mar 2007 23:14:59 -0500
Subject: [BioPython] IOError: [Errno 2] No such file or directory:
Message-ID: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>

Hello,

I receive the following error when I am trying to access a file called HISTORY
in an another file by this command

template = '~/CSH/HISTORY'

and I get this error.

IOError: [Errno 2] No such file or directory: '~/CSH/HISTORY'

I use python in Linux environment. I appreciate any suggestions/comments.

Hossein Shahsavari

From biopython at maubp.freeserve.co.uk  Sun Mar  4 06:58:07 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 04 Mar 2007 11:58:07 +0000
Subject: [BioPython] IOError: [Errno 2] No such file or directory:
In-Reply-To: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>
References: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>
Message-ID: <45EAB44F.1080909@maubp.freeserve.co.uk>

Hossein Shahsavari wrote:
> Hello,
> 
> I receive the following error when I am trying to access a file called HISTORY
> in an another file by this command
> 
> template = '~/CSH/HISTORY'
> 
> and I get this error.
> 
> IOError: [Errno 2] No such file or directory: '~/CSH/HISTORY'
> 
> I use python in Linux environment. I appreciate any suggestions/comments.

If you have posted the python code it would be easier to guess what is 
going wrong.  What does this do?

import os
template = '~/CSH/HISTORY'
print os.path.isfile(template)

That should print either True or False.

You might also try replacing the tilde ('~') with the actual path of 
your home folder, something like this typically:

template = '/home/username/CSH/HISTORY'

P.S. Have you checked the case?  Linux and Unix are case sensitive.

Peter


From shahs at MIT.EDU  Sun Mar  4 11:57:34 2007
From: shahs at MIT.EDU (Hossein Shahsavari)
Date: Sun, 04 Mar 2007 11:57:34 -0500
Subject: [BioPython] IOError: [Errno 2] No such file or directory:
In-Reply-To: <45EAB44F.1080909@maubp.freeserve.co.uk>
References: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>
	<45EAB44F.1080909@maubp.freeserve.co.uk>
Message-ID: <20070304115734.xe5ms6c7vkv8k4wk@webmail.mit.edu>

Hi

Thanks for your guidances. The problem was the tilde ('~') which I replaced by
the correct path and now it works. I have another maybe simple question:

I have 26 files namely output1, output2,...,output26. I can read them 
one by one
but how can read them all by an easier way like a loop ? I put a "for loop" by
setting

i=1

for i in range(1,27)
template='outputi'

however, I got the same error as above IOError: [Errno 2] No such file or
directory: 'outputi'. It seems "i" can't be attached to the output.

Thanks alot

Hossein


From biopython at maubp.freeserve.co.uk  Sun Mar  4 12:34:26 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 04 Mar 2007 17:34:26 +0000
Subject: [BioPython] IOError: [Errno 2] No such file or directory:
In-Reply-To: <20070304115734.xe5ms6c7vkv8k4wk@webmail.mit.edu>
References: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>	<45EAB44F.1080909@maubp.freeserve.co.uk>
	<20070304115734.xe5ms6c7vkv8k4wk@webmail.mit.edu>
Message-ID: <45EB0322.8050809@maubp.freeserve.co.uk>

Hossein Shahsavari wrote:
 > I have another maybe simple question:
> 
> I have 26 files namely output1, output2,...,output26. I can read them 
> one by one but how can read them all by an easier way like a loop ?
 > I put a "for loop" by setting
> 
> i=1
> 
> for i in range(1,27)
> template='outputi'
> 
> however, I got the same error as above IOError: [Errno 2] No such file or
> directory: 'outputi'. It seems "i" can't be attached to the output.
> 
> Thanks alot
> 
> Hossein

You should really try a basic introduction to python.  There are lots of 
tutorials online, and great books too.  Your questions so far are not 
really related to BioPython at all.

Note that indentation is very important in python.  You were also 
missing the colon at the end for line.  More importantly the following 
line just sets the variable template to the string 'outputi', and 
doesn't do anything with the variable i.

template='outputi'

You want to do something like this:

for i in range(1,27) :
     template = 'output' + str(i)
     print template

Good luck.

Peter


From lucks at fas.harvard.edu  Mon Mar  5 09:13:02 2007
From: lucks at fas.harvard.edu (Julius Lucks)
Date: Mon, 5 Mar 2007 09:13:02 -0500
Subject: [BioPython] blast parsing errors
Message-ID: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>

Hi all,

I am trying to parse a bunch of blast results that I gather via  
NCBIWWW.qblast().  I have the following code snipit:

-----------
from Bio imort Fasta
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
import StringIO
import re

#BLAST cutoff
cutoff = 1e-4

#Create a fasta record: title and seq are given

title = 'test'
seq = 'ATCG'

fasta_rec = Fasta.Record()
	
#Sanitize title - blast does not like single quotes or \n in titles
title = re.sub("'","prime",title)
title = re.sub("\n","",title)
fasta_rec.title = title
fasta_rec.sequence = seq


b_parser = NCBIXML.BlastParser()

result_handle = NCBIWWW.qblast 
('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
z_query="Viruses [ORGN]")
blast_results = result_handle.read()
			
blast_handle = StringIO.StringIO(blast_results)
b_record = b_parser.parse(blast_handle)

for alignment in b_record.alignments:
     titles = alignment.title.split('>')
     print titles

-------------


The issue is sometimes the blast parser chokes with tracebacks like:

   File "./src/create_annotations.py", line 96, in get_blast_annotations
     b_record = b_parser.parse(blast_handle)  File "/sw/lib/python2.5/ 
site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
     self._parser.parse(handler)  File "/sw/lib/python2.5/xml/sax/ 
expatreader.py", line 107, in parse
     xmlreader.IncrementalParser.parse(self, source)  File "/sw/lib/ 
python2.5/xml/sax/xmlreader.py", line 123, in parse
     self.feed(buffer)
   File "/sw/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
     self._err_handler.fatalError(exc)
   File "/sw/lib/python2.5/xml/sax/handler.py", line 38, in  
fatalError    raise exception
   xml.sax._exceptions.SAXParseException: <unknown>:7:70: not well- 
formed (invalid token)

I am not sure which alignment it choked on, but I would like to  
rescue it with a try/except block if possible.  But it seems to me  
that if I did something like

try:
     b_record = b_parser.parse(blast_handle)
except:
     ...

Then I would not get anything in b_record if an error raised in the  
parsing.  Rather, I would like to have whatever has been successful  
up to the point of the error stored in b_record.

Is there any way to do this via the BioPython API, or do I have to  
dig into the python xml parsing code?

Also, if anyone has a better idea of how to structure this code, I  
would be very appreciative.

Cheers,

Julius

-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------


From biopython at maubp.freeserve.co.uk  Mon Mar  5 09:55:38 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 05 Mar 2007 14:55:38 +0000
Subject: [BioPython] blast parsing errors
In-Reply-To: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
Message-ID: <45EC2F6A.6090200@maubp.freeserve.co.uk>

Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:

You didn't say which version of BioPython you are using, I would guess 
1.42 - there have been some Bio.Blast changes since than.

Your example sequence was "ATCG", but you ran a "blastp" search.  Did 
you really mean the peptide Ala-Thr-Cys-Gly here?

If you meant to do a nucleotide search, try using "blastn" and "nr" 
instead.  That should work better.

However, there is still something funny going on.  I tried your example 
as is using the CVS code, and it fails before it even gets the blast 
results back...

Could you save the XML output to a file and email it to me; or even 
better file a bug an attach the XML file to the bug.

Thanks

Peter

From biopython at maubp.freeserve.co.uk  Mon Mar  5 10:12:25 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 05 Mar 2007 15:12:25 +0000
Subject: [BioPython] blast parsing errors
In-Reply-To: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
Message-ID: <45EC3359.1030802@maubp.freeserve.co.uk>

Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:

I am wondering if your trivial example triggered some "unusual" error 
page from the NCBI...

I would suggest you update to CVS, as we have made a lot of changes to 
the Blast XML support.  You would probably be safe just updating the 
following  Bio.Blast files, located here on your machine:

/sw/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIWWW.py
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py
/sw/lib/python2.5/site-packages/Bio/Blast/Record.py

If you don't know how to use CVS, then just backup the originals, and 
replace them with the new files download one by one from here:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython

----------------------------------------------------------------------

This works for me using the CVS version of BioPython.  I have just made 
a string for rather than messing about with a fasta record object to 
keep the code short:

#Protein example, BLASTP
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

#BLAST cutoff
cutoff = 1e-4

fasta_rec = ">GI:121308427\nrslgmevmhernahnfpldlaavevpsing"

b_parser = NCBIXML.BlastParser()
result_handle = NCBIWWW.qblast('blastp', 'nr', fasta_rec, ncbi_gi=1,
                                expect=cutoff, format_type="XML",
                                entrez_query="Viruses [ORGN]")

#This returns a record iterator, changed after release of BioPython 1.42
b_records = b_parser.parse(result_handle)

for b_record in b_records :
     print "%s found %i results" % (b_record.query, 
len(b_record.alignments))
     for alignment in b_record.alignments:
          titles = alignment.title.split('>')
          print titles


Or, if you wanted to do a nucleotide BLASTN search, try:

fasta_rec = '>GI:121308427\nttagccatttatagatggaacttcaacagcagctaagtc' \
           + 'tagagggaaattgtgagcattacgctcgtgcatgacctccataccaagagatct'

and replace 'blastp' with 'blastn' in the call to qblast().

Peter

From mdehoon at c2b2.columbia.edu  Mon Mar  5 10:36:43 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Mon, 05 Mar 2007 10:36:43 -0500
Subject: [BioPython] blast parsing errors
In-Reply-To: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
Message-ID: <45EC390B.8020400@c2b2.columbia.edu>

Julius Lucks wrote:
> seq = 'ATCG'
 > ...
> fasta_rec.sequence = seq
> ...
> result_handle = NCBIWWW.qblast 
> ('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
You have a nucleotide sequence but are running a protein-protein blast 
with blastp. If you run this exact search with Blast through a browser, 
it will show you an error message. The function 
_parse_qblast_ref_page(handle), which is called from NCBIWWW.qblast, 
chokes on this error message. If you want to make this more robust, one 
solution might be to check for error messages returned by the Blast 
server in _parse_qblast_ref_page.

By the way, the code can be simplified as follows:
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

#BLAST cutoff
cutoff = 1e-4

#Create a fasta record: title and seq are given
seq = 'ATCG'

b_parser = NCBIXML.BlastParser()

result_handle = NCBIWWW.qblast('blastn', 'nr', seq, ncbi_gi=1, 
expect=cutoff, format_type="XML", entrez_query="Viruses [ORGN]")
			
b_records = b_parser.parse(result_handle)
b_record = b_records[0]

for alignment in b_record[0].alignments:
      titles = alignment.title.split('>')
      print titles

--------------------------------------------

Note: the BlastParser currently in CVS returns a list of Blast records 
instead of a single Blast record, hence the b_records[0] above.

Btw, with NCBIXML currently in CVS, you don't need to create b_parser first:

result_handle = NCBIWWW.qblast('blastn', 'nr', seq, ncbi_gi=1, 
expect=cutoff, format_type="XML", entrez_query="Viruses [ORGN]")
			
b_records = NCBIXML.parse(result_handle)
b_record = b_records.next()

-----------------------------------------------

--Michiel.


-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032

From winter at biotec.tu-dresden.de  Mon Mar  5 10:07:00 2007
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Mon, 05 Mar 2007 16:07:00 +0100
Subject: [BioPython] blast parsing errors
In-Reply-To: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
Message-ID: <45EC3214.5050100@biotec.tu-dresden.de>

Running your example, I get:

 >>> ## working on region in file /tmp/python-18415Uda.py...
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "/tmp/python-18415Uda.py", line 25, in ?
     result_handle = 
NCBIWWW.qblast('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entrez_query="Viruses 
[ORGN]")
   File "/var/lib/python-support/python2.4/Bio/Blast/NCBIWWW.py", line 1091, in qblast
     rid, rtoe = _parse_qblast_ref_page(handle)
   File "/var/lib/python-support/python2.4/Bio/Blast/NCBIWWW.py", line 1133, in 
_parse_qblast_ref_page
     return rid, int(rtoe)
ValueError: invalid literal for int(): >
<head>
<title>NCBI Blast</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css">
<link rel="styl
 >>>

I think I'm running the newest 1.44 version of NCBIWWW.py

Cheers,
Christof

Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:
> 
> -----------
> from Bio imort Fasta
> from Bio.Blast import NCBIWWW
> from Bio.Blast import NCBIXML
> import StringIO
> import re
> 
> #BLAST cutoff
> cutoff = 1e-4
> 
> #Create a fasta record: title and seq are given
> 
> title = 'test'
> seq = 'ATCG'
> 
> fasta_rec = Fasta.Record()
> 	
> #Sanitize title - blast does not like single quotes or \n in titles
> title = re.sub("'","prime",title)
> title = re.sub("\n","",title)
> fasta_rec.title = title
> fasta_rec.sequence = seq
> 
> 
> b_parser = NCBIXML.BlastParser()
> 
> result_handle = NCBIWWW.qblast 
> ('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
> z_query="Viruses [ORGN]")
> blast_results = result_handle.read()
> 			
> blast_handle = StringIO.StringIO(blast_results)
> b_record = b_parser.parse(blast_handle)
> 
> for alignment in b_record.alignments:
>      titles = alignment.title.split('>')
>      print titles
> 
> -------------
> 
> 
> The issue is sometimes the blast parser chokes with tracebacks like:
> 
>    File "./src/create_annotations.py", line 96, in get_blast_annotations
>      b_record = b_parser.parse(blast_handle)  File "/sw/lib/python2.5/ 
> site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
>      self._parser.parse(handler)  File "/sw/lib/python2.5/xml/sax/ 
> expatreader.py", line 107, in parse
>      xmlreader.IncrementalParser.parse(self, source)  File "/sw/lib/ 
> python2.5/xml/sax/xmlreader.py", line 123, in parse
>      self.feed(buffer)
>    File "/sw/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
>      self._err_handler.fatalError(exc)
>    File "/sw/lib/python2.5/xml/sax/handler.py", line 38, in  
> fatalError    raise exception
>    xml.sax._exceptions.SAXParseException: <unknown>:7:70: not well- 
> formed (invalid token)
> 
> I am not sure which alignment it choked on, but I would like to  
> rescue it with a try/except block if possible.  But it seems to me  
> that if I did something like
> 
> try:
>      b_record = b_parser.parse(blast_handle)
> except:
>      ...
> 
> Then I would not get anything in b_record if an error raised in the  
> parsing.  Rather, I would like to have whatever has been successful  
> up to the point of the error stored in b_record.
> 
> Is there any way to do this via the BioPython API, or do I have to  
> dig into the python xml parsing code?
> 
> Also, if anyone has a better idea of how to structure this code, I  
> would be very appreciative.
> 
> Cheers,
> 
> Julius
> 
> -----------------------------------------------------
> http://openwetware.org/wiki/User:Lucks
> -----------------------------------------------------
> 
> 
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

From lucks at fas.harvard.edu  Mon Mar  5 11:24:58 2007
From: lucks at fas.harvard.edu (Julius Lucks)
Date: Mon, 5 Mar 2007 11:24:58 -0500
Subject: [BioPython] blast parsing errors
In-Reply-To: <45EC2F6A.6090200@maubp.freeserve.co.uk>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
	<45EC2F6A.6090200@maubp.freeserve.co.uk>
Message-ID: <27D93A40-C5AC-4708-BF4C-0ADCFD413B46@fas.harvard.edu>

Thanks guys,

You are right - I am using BioPython 1.42, and python2.5 installed  
via fink on Mac OS X.

I meant to use an amino acid sequence for the seq variable, and I  
have included the revised code snippet which uses the protein  
sequence that gave me trouble in the first place.  However, there is  
no problem when using the current CVS code.  Thanks for all of your  
help!

I have 3 questions:

1.) Is the documentation for the new NCBIXML and NBCIWWW up to date?
2.) Why is NCBIXML.parse returning an iterator in this case since  
there is only one result?  Or in other words, what are the use cases  
where an iterator is necessary?
3.) How are the fink packages of Biopython maintained?  I am using  
the fink unstable tree, which means that I am getting the most  
current version that fink has.  If Biopython 1.44 is substantially  
different from 1.42 (current fink), can we update the fink version  
faster than we currently are?

Cheers,

Julius

---------- code that works in Biopython 1.44 --------

from Bio import Fasta
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
import StringIO
import re

#BLAST cutoff
cutoff = 1e-4

#Create a fasta record: title and seq are given
title = 'test'
seq = '\
MESFSVQAYLKATDNNFVSTFKDAAKQVQNF\
EKNTNSTMSTVGKVATSTGKTLTKAVTVPII\
GIGVAAAKIGGDFESQMSRVKAISGATGSSF\
EELRQQAIDLGAKTAFSAKESASGMENLASA\
GFNAKEIMEAMPGLLDLAAVSGGDVALASEN\
AATALRGFNLDASQSGHVANVFAKAAADTNA\
EVGDMGEAMKYIAPVANSMGLSIEEVSAAIG\
IMSDAGIKGSQAGTSLRGALSRLADPTDAMQ\
AKMDELGLSFYDSEGKMKPLKDQIGMLKDAF\
KGLTPEQQQNALVTLYGQESLSGMMALIDKG\
PDKLGKLTESLKNSDGAADKMAKTMQDNMNS\
SLEQMMGAFESAAIVVQKILAPAVRKVADSI\
SGLVDKFVSAPEPVQKMIVTIGLIVAAIGPL\
LVIFGQAVVTLQRVKVGFLALRSGLALIGGS\
FTAISLPVLGIIAAIAAVIAIGILVYKNWDK\
ISKFGKEVWANVKKFASDAAEVIKEKWGDIT\
QWFSDTWNNIKNGAKGLWDGTVQGAKNAVDS\
VKNAWNGIKEWFTNLWKGTTSGLSSAWDSVT\
TTLAPFVETIKTIFQPILDFFSGLWGQVQTI\
FGSAWEIIKTVVMGPVLLLIDLITGDFNQFK\
KDFAMLWQTLFTNIQTLVTTYVQIVVGFFTA\
WGQTVSNIWTTVVNTIQSLWGAFTTWVINMA\
KSIVDGIVNGWNSFKQGTVDLWNATVQWVKD\
TWASFKQWVVDSANAIVNGVKQGWENLKQGT\
IDLWNGMINGLKGIWDGLKQSVRNLIDNVKT\
TFNNLKNINLLDIGKAIIDGLVKGLKKKWED\
GMKFISGIGDWIRKHKGPIRKDRKLLIPAGK\
AIMTGLNSGLTGGFRNVQSNVSGMGDMIANA\
INSDYSVDIGANVAAANRSISSQVSHDVNLN\
QGKQPASFTVKLGNQIFKAFVDDISNAQGQA\
INLNMGF*'


fasta_rec = Fasta.Record()
	
#Sanitize title - blast does not like single quotes or \n in titles
title = re.sub("'","prime",title)
title = re.sub("\n","",title)
fasta_rec.title = title
fasta_rec.sequence = seq


result_handle = NCBIWWW.qblast  
('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
z_query="Viruses [ORGN]")
b_records = NCBIXML.parse(result_handle)

for b_record in b_records:
     print "%s found %i results" % (b_record.query, len 
(b_record.alignments))
     for alignment in b_record.alignments:
          titles = alignment.title.split('>')
          print titles

----------


-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------


On Mar 5, 2007, at 9:55 AM, Peter wrote:

> Julius Lucks wrote:
>> Hi all,
>> I am trying to parse a bunch of blast results that I gather via   
>> NCBIWWW.qblast().  I have the following code snipit:
>
> You didn't say which version of BioPython you are using, I would  
> guess 1.42 - there have been some Bio.Blast changes since than.
>
> Your example sequence was "ATCG", but you ran a "blastp" search.   
> Did you really mean the peptide Ala-Thr-Cys-Gly here?
>
> If you meant to do a nucleotide search, try using "blastn" and "nr"  
> instead.  That should work better.
>
> However, there is still something funny going on.  I tried your  
> example as is using the CVS code, and it fails before it even gets  
> the blast results back...
>
> Could you save the XML output to a file and email it to me; or even  
> better file a bug an attach the XML file to the bug.
>
> Thanks
>
> Peter


From mdehoon at c2b2.columbia.edu  Mon Mar  5 11:49:53 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Mon, 05 Mar 2007 11:49:53 -0500
Subject: [BioPython] blast parsing errors
In-Reply-To: <27D93A40-C5AC-4708-BF4C-0ADCFD413B46@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
	<45EC2F6A.6090200@maubp.freeserve.co.uk>
	<27D93A40-C5AC-4708-BF4C-0ADCFD413B46@fas.harvard.edu>
Message-ID: <45EC4A31.9010207@c2b2.columbia.edu>

Julius Lucks wrote:
> 1.) Is the documentation for the new NCBIXML and NBCIWWW up to date?

No it is not. To ensure that the documentation on the website agrees 
with the current Biopython release, the idea was to update the 
documentation when the next Biopython release comes out. Originally we 
were planning to make a new Biopython release as soon as the new 
Bio.SeqIO code is done. However, I'd be happy to make a release in the 
immediate future without the new Bio.SeqIO, and make another one once 
Bio.SeqIO is ready.

> 2.) Why is NCBIXML.parse returning an iterator in this case since there 
> is only one result?  Or in other words, what are the use cases where an 
> iterator is necessary?

If you're parsing multiple Blast search results at the same time. In 
other words, if the fasta file for the blast search looked like
 > gene1
ATAGCTACG...
 > gene2
ATCGATCGATGGCA...
 > gene3
....

Such a file can be very large, which is why we are using an iterator 
instead of a list.

Now, one may argue that NCBIXML.parse should return a single record 
instead of an iterator if there's only one result. Others may argue that 
for consistency, it should always return an iterator. Either way is fine 
with me. Anybody have a strong opinion about this?

> 3.) How are the fink packages of Biopython maintained?
I don't know. But, it's not too difficult to install Biopython from the 
source distribution or from CVS. So if you want to be sure you have the 
latest version, you might want to try installing from CVS.

--Michiel.

-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032

From biopython at maubp.freeserve.co.uk  Tue Mar  6 11:27:58 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 06 Mar 2007 16:27:58 +0000
Subject: [BioPython] Bio.Kabat and the Kabat Database
Message-ID: <45ED968E.1010102@maubp.freeserve.co.uk>

I've been looking though the modules in BioPython, and had a closer look
at Bio.Kabat written by Katharine Lindner in 2001 to parse files from
the Kabat database of proteins of immunological interest:

http://www.kabatdatabase.com/

Quoting the website,

>  01 September 2006
 >
> Interested parties may purchase the Database, in ASCII text
> structured flat files, as well as an SQL relationship database (not
> previously available), for $2250 US.
> 
> This one-time license fee is unrestricted, except for distribution.
> 
> Analysis Tools
> 
> The searching and analysis tools are additionally available.
> Included are generalized lookup, aligned sequence searching light
> chain alignment, length distribution, positional correlation,
> variability, and much more.  Please contact for quote.

Does anyone use the Bio.Kabat code?  Could we (or should we) mark it as 
depreciated for the next release of BioPython?

Peter


From snakepit.rattlesnakes at gmail.com  Mon Mar 12 05:59:25 2007
From: snakepit.rattlesnakes at gmail.com (Joydeep Mitra)
Date: Mon, 12 Mar 2007 15:29:25 +0530
Subject: [BioPython] Retrieving the raw sequence from sequence object...
Message-ID: <972566ff0703120259k3979c223r2172f631d48fa6fd@mail.gmail.com>

Hi,
I'm a student of bioinformatics (coming from a biological background).

I've just started using biopython for parsing biological file formats.
The Bio.Fasta module contains the fasta iterator object, which spits out
sequence objects...of the form:

 Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGATCACATAATAAT ...',
IUPACAmbiguousDNA())

I want to retrieve the sequence in it's entirety and in raw format....how
does one do that using an instance object?
I've tried a few things without success...will be glad if some1 could show
me how...

Thanking in advance,

Joy

From sdavis2 at mail.nih.gov  Mon Mar 12 06:20:11 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 12 Mar 2007 06:20:11 -0400
Subject: [BioPython] Retrieving the raw sequence from sequence object...
In-Reply-To: <972566ff0703120259k3979c223r2172f631d48fa6fd@mail.gmail.com>
References: <972566ff0703120259k3979c223r2172f631d48fa6fd@mail.gmail.com>
Message-ID: <200703120620.11871.sdavis2@mail.nih.gov>

On Monday 12 March 2007 05:59, Joydeep Mitra wrote:
> Hi,
> I'm a student of bioinformatics (coming from a biological background).
>
> I've just started using biopython for parsing biological file formats.
> The Bio.Fasta module contains the fasta iterator object, which spits out
> sequence objects...of the form:
>
>  Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGATCACATAATAAT ...',
> IUPACAmbiguousDNA())
>
> I want to retrieve the sequence in it's entirety and in raw format....how
> does one do that using an instance object?
> I've tried a few things without success...will be glad if some1 could show
> me how...

If you have a sequence object, "myseq":

myseq.tostring()

See here for more details:

http://biopython.org/DIST/docs/tutorial/Tutorial.html

Section 2.2.

Hope that helps.

Sean

From lucks at fas.harvard.edu  Wed Mar 14 18:16:32 2007
From: lucks at fas.harvard.edu (Julius Lucks)
Date: Wed, 14 Mar 2007 18:16:32 -0400
Subject: [BioPython] Biopython hackathon
Message-ID: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>

Hi all,

I was just chatting with Jason Stajich about the openbio hackathon  
that took place a few years ago.  What biopython projects do people  
think would be appropriate for another hackathon in the near future?   
Are there things that have been on the TODO list for a while?  New  
functionality that would benefit from a bunch of us getting together  
in one place (possibly with other openbio projects)?

Cheers,

Julius

-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------


From aloraine at gmail.com  Wed Mar 14 19:43:23 2007
From: aloraine at gmail.com (Ann Loraine)
Date: Wed, 14 Mar 2007 17:43:23 -0600
Subject: [BioPython] Biopython hackathon
In-Reply-To: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>
References: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>
Message-ID: <83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com>

I hope you will consider the following two requests as possible
hackathon activities:

(1) If it does not already do this, it would be nice if the blast
"plain text" (non-XML) parser would report the length of the target
("hit") sequence as well as the query. If I recall correctly, the last
time I used the plain text blast parser, I had to measure the length
of the targets by opening up the fasta copy of the blastable database
and reading the lengths one-by-one. My database wasn't very big, so it
wasn't a hassle to do this, but I can foresee situations where this
kludge would fail.

(2) Another request is for a Python interface to the Bio::Db::SQL
database schema described in:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12368253.

The part that seems particularly valuable would be code that to
construct the location-based queries.

My apologies if this already exists -- if yes, please let me know
where I can find it!!

All the best,

Ann Loraine


On 3/14/07, Julius Lucks <lucks at fas.harvard.edu> wrote:
> Hi all,
>
> I was just chatting with Jason Stajich about the openbio hackathon
> that took place a few years ago.  What biopython projects do people
> think would be appropriate for another hackathon in the near future?
> Are there things that have been on the TODO list for a while?  New
> functionality that would benefit from a bunch of us getting together
> in one place (possibly with other openbio projects)?
>
> Cheers,
>
> Julius
>
> -----------------------------------------------------
> http://openwetware.org/wiki/User:Lucks
> -----------------------------------------------------
>
>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>

From cjfields at uiuc.edu  Wed Mar 14 20:29:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Mar 2007 19:29:43 -0500
Subject: [BioPython] Biopython hackathon
In-Reply-To: <83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com>
References: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>
	<83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com>
Message-ID: <8FC21D46-FEC1-43FF-B770-BC4D05E569D5@uiuc.edu>


On Mar 14, 2007, at 6:43 PM, Ann Loraine wrote:

> ...
> (2) Another request is for a Python interface to the Bio::Db::SQL
> database schema described in:
>
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? 
> cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12368253.
>
> The part that seems particularly valuable would be code that to
> construct the location-based queries.

Did you mean Bio::DB::GFF?  AFAIK Lincoln is moving stuff over to a  
newer system that better facilitates GFF3, namely  
Bio::DB::SeqFeature.  I'm not sure how well Bio::DB::GFF is supported  
currently.

> My apologies if this already exists -- if yes, please let me know
> where I can find it!!

You can find out about the current state of GBrowse affairs by  
emailing the GBrowse mail list.  Here's the sourceforge link:

http://sourceforge.net/mailarchive/forum.php?forum_id=31947

Scott and Lincoln can indicate where their current focus is re: GFF3  
and sequence feature database development.

chris

> All the best,
>
> Ann Loraine


Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From aloraine at gmail.com  Thu Mar 15 02:52:13 2007
From: aloraine at gmail.com (Ann Loraine)
Date: Thu, 15 Mar 2007 00:52:13 -0600
Subject: [BioPython] Biopython hackathon
In-Reply-To: <8FC21D46-FEC1-43FF-B770-BC4D05E569D5@uiuc.edu>
References: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>
	<83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com>
	<8FC21D46-FEC1-43FF-B770-BC4D05E569D5@uiuc.edu>
Message-ID: <83722dde0703142352y325698bag5d43c4a771bb1e5@mail.gmail.com>

Thanks, I meant Bio::DB::GFF -- the schema shown in the paper.

A simple schema that represents features on genomic sequence and
easily supports fast region-based queries is what I'm after. The
indexing scheme in the paper looked good, so I was hoping to find
python code that would hide the details of formulating the SQL. My
main goal is speed -- running the queries and then outputting data in
GFF, bed, or DAS XML, as the need arises.

-Ann

On 3/14/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On Mar 14, 2007, at 6:43 PM, Ann Loraine wrote:
>
> > ...
> > (2) Another request is for a Python interface to the Bio::Db::SQL
> > database schema described in:
> >
> > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
> > cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12368253.
> >
> > The part that seems particularly valuable would be code that to
> > construct the location-based queries.
>
> Did you mean Bio::DB::GFF?  AFAIK Lincoln is moving stuff over to a
> newer system that better facilitates GFF3, namely
> Bio::DB::SeqFeature.  I'm not sure how well Bio::DB::GFF is supported
> currently.
>
> > My apologies if this already exists -- if yes, please let me know
> > where I can find it!!
>
> You can find out about the current state of GBrowse affairs by
> emailing the GBrowse mail list.  Here's the sourceforge link:
>
> http://sourceforge.net/mailarchive/forum.php?forum_id=31947
>
> Scott and Lincoln can indicate where their current focus is re: GFF3
> and sequence feature database development.
>
> chris
>
> > All the best,
> >
> > Ann Loraine
>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


-- 
Ann Loraine, Assistant Professor
Departments of Genetics, Biostatistics,
Computer and Information Sciences
Associate Scientist, Comprehensive Cancer Center
University of Alabama at Birmingham
http://www.transvar.org
205-996-4155

From mdehoon at c2b2.columbia.edu  Fri Mar 16 15:09:10 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Fri, 16 Mar 2007 15:09:10 -0400
Subject: [BioPython] Bio.Kabat and the Kabat Database
In-Reply-To: <45ED968E.1010102@maubp.freeserve.co.uk>
References: <45ED968E.1010102@maubp.freeserve.co.uk>
Message-ID: <45FAEB56.9080509@c2b2.columbia.edu>

Peter wrote:
> Does anyone use the Bio.Kabat code?  Could we (or should we) mark it as
> depreciated for the next release of BioPython?

Since no users of Bio.Kabat came forward, I've marked it as deprecated 
for the upcoming release. This only means that importing Bio.Kabat will 
show a warning message, so the code is still usable. If still no users 
come forward, we can remove Bio.Kabat in a later release.

--Michiel.

-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032

From mdehoon at c2b2.columbia.edu  Sat Mar 17 19:26:50 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Sat, 17 Mar 2007 19:26:50 -0400
Subject: [BioPython] Biopython release 1.43
Message-ID: <45FC793A.2010106@c2b2.columbia.edu>

Dear Biopythoneers,

We are pleased to announce the release of Biopython 1.43.

This release includes a brand-new set of parsers in Bio.SeqIO by Peter 
Cock for reading biological sequence files in various formats, an 
updated Blast XML parser in Bio.Blast.NCBIXML, a new UniGene flat-file 
parser by  Sean Davis, and numerous improvements and bug fixes in 
Bio.PDB, Bio.SwissProt, Bio.Nexus, BioSQL, and others. Believe it or 
not, even the documentation was updated.

Source distributions and Windows installers are available from the 
Biopython website at http://biopython.org. My thanks to all code 
contributers who made this new release possible.

--Michiel on behalf of the Biopython developers


-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032

From rohini.damle at gmail.com  Mon Mar 19 17:08:36 2007
From: rohini.damle at gmail.com (Rohini Damle)
Date: Mon, 19 Mar 2007 13:08:36 -0800
Subject: [BioPython] pdf articles from medline
Message-ID: <d9fd76050703191408l1197b662o7542b547d4fa1664@mail.gmail.com>

Hi,
Does anyone know if there is any provision in Biopython to download
PDF articles from Medline, if we have a list of pubmed ids?
Thank you for your help.
-Rohini.

From sdavis2 at mail.nih.gov  Mon Mar 19 18:37:23 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 19 Mar 2007 18:37:23 -0400
Subject: [BioPython] pdf articles from medline
In-Reply-To: <d9fd76050703191408l1197b662o7542b547d4fa1664@mail.gmail.com>
References: <d9fd76050703191408l1197b662o7542b547d4fa1664@mail.gmail.com>
Message-ID: <45FF10A3.7090708@mail.nih.gov>

Rohini Damle wrote:
> Hi,
> Does anyone know if there is any provision in Biopython to download
> PDF articles from Medline, if we have a list of pubmed ids?
>   
Medline does not store the PDFs, in general, so I don't think that is 
possible.  You could certainly scrape the HTML for links and then follow 
them, looking for a PDF link, but there isn't a general solution for all 
journals, etc. 

Sean

From jingzhou2005 at gmail.com  Wed Mar 21 11:49:20 2007
From: jingzhou2005 at gmail.com (Jing Zhou)
Date: Wed, 21 Mar 2007 11:49:20 -0400
Subject: [BioPython] can PDBParser work on a pdb file without residue
	sequence information?
Message-ID: <4b036ac00703210849n43b41ff8hd1f52784452f1d3c@mail.gmail.com>

I want to parse a pseudo pdb file I generated for a bunch of points in
space. There is no physical meaning to assign residue number or chain number
to it. Here is the example of one line:
ATOM      1  C                  -7.083  -6.182  50.181  1.00
(the atom type has no physical meaning either)
I just want to be able to display the positions of these points in viewer.
What I want to do is to use biopython to parse this pseudo pdb file and
display it in vtk. (I am already able to see the points in pymol using this
pseudo pdb file)

Here is the function I tried to use to parse this pseudo pdb file:
from Bio.PDB import PDBParser
parser = PDBParser()
structure = parser.get_structure('mypdb',mypseudopdbfile)

Here is the error message:
    structure = parser.get_structure('mypdb',mypseudopdbfile)
  File "C:\Python24\lib\site-packages\Bio\PDB\PDBParser.py", line 66, in
get_structure
    self._parse(file.readlines())
  File "C:\Python24\lib\site-packages\Bio\PDB\PDBParser.py", line 87, in
_parse
    self.trailer=self._parse_coordinates(coords_trailer)
  File "C:\Python24\lib\site-packages\Bio\PDB\PDBParser.py", line 144, in
_parse_coordinates
    resseq=int(split(line[22:26])[0])   # sequence identifier
IndexError: list index out of range

It seems that the error happens to seek residue sequence information at
column 22-26. Sure, I can try to add fake resseq info. but my question is
whether there is another way around to neglect the index of residue sequence
number? I just want to directly get the array of coordinates and display
them.

Thanks

Jing

From zsun at fas.harvard.edu  Wed Mar 21 23:41:22 2007
From: zsun at fas.harvard.edu (Zachary Zhipeng Sun)
Date: Wed, 21 Mar 2007 23:41:22 -0400
Subject: [BioPython] BLAST SNP access?
Message-ID: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>

Hello,

 
Thank you for the Bioython tools! They are proving increasingly useful in my
research. I had a question regarding a tool extension, however - is there
any link in biopython to query the BLAST SNP database, or does anyone know
of this being under development? If not, I am not too familiar with the
backend of biopython but I was looking to be able to automate BLAST SNP
searches; does anyone have advice on how to start coding this into the
biopython environment, or to a version of NCBIWWW.py ? Thanks for your help!

 
Best,

Zachary Sun

 
From mdehoon at c2b2.columbia.edu  Thu Mar 22 00:21:40 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Thu, 22 Mar 2007 00:21:40 -0400
Subject: [BioPython] BLAST SNP access?
In-Reply-To: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
References: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
Message-ID: <46020454.2060100@c2b2.columbia.edu>

Hi Zach,

Can you give us an example of how you currently query the BLAST SNP 
database (without using Biopython)? Just to get an idea of how different 
it is from current BLAST searches with Biopython.

--Michiel.

Zachary Zhipeng Sun wrote:
> Hello,
> 
>  
> 
> Thank you for the Bioython tools! They are proving increasingly useful in my
> research. I had a question regarding a tool extension, however - is there
> any link in biopython to query the BLAST SNP database, or does anyone know
> of this being under development? If not, I am not too familiar with the
> backend of biopython but I was looking to be able to automate BLAST SNP
> searches; does anyone have advice on how to start coding this into the
> biopython environment, or to a version of NCBIWWW.py ? Thanks for your help!
> 
>  
> 
> Best,
> 
> Zachary Sun
> 
>  
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From cjfields at uiuc.edu  Thu Mar 22 00:38:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Mar 2007 23:38:11 -0500
Subject: [BioPython] BLAST SNP access?
In-Reply-To: <46020454.2060100@c2b2.columbia.edu>
References: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
	<46020454.2060100@c2b2.columbia.edu>
Message-ID: <1CB2E066-3A73-429E-80B6-639F26FAD144@uiuc.edu>

If you are using the NCBI URLAPI interface you can set the databases  
to anything on the following page, just follow the instructions:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/ 
remote_accessible_blastdblist.html

This includes SNP data.  I find this works for BioPerl's RemoteBlast  
(URLAPI-based).

chris

On Mar 21, 2007, at 11:21 PM, Michiel de Hoon wrote:

> Hi Zach,
>
> Can you give us an example of how you currently query the BLAST SNP
> database (without using Biopython)? Just to get an idea of how  
> different
> it is from current BLAST searches with Biopython.
>
> --Michiel.
>
> Zachary Zhipeng Sun wrote:
>> Hello,
>>
>>
>>
>> Thank you for the Bioython tools! They are proving increasingly  
>> useful in my
>> research. I had a question regarding a tool extension, however -  
>> is there
>> any link in biopython to query the BLAST SNP database, or does  
>> anyone know
>> of this being under development? If not, I am not too familiar  
>> with the
>> backend of biopython but I was looking to be able to automate  
>> BLAST SNP
>> searches; does anyone have advice on how to start coding this into  
>> the
>> biopython environment, or to a version of NCBIWWW.py ? Thanks for  
>> your help!
>>
>>
>>
>> Best,
>>
>> Zachary Sun
>>
>>
>>
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From zsun at fas.harvard.edu  Thu Mar 22 01:01:44 2007
From: zsun at fas.harvard.edu (Zachary Zhipeng Sun)
Date: Thu, 22 Mar 2007 01:01:44 -0400
Subject: [BioPython] BLAST SNP access?
In-Reply-To: <1CB2E066-3A73-429E-80B6-639F26FAD144@uiuc.edu>
References: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
	<46020454.2060100@c2b2.columbia.edu>
	<1CB2E066-3A73-429E-80B6-639F26FAD144@uiuc.edu>
Message-ID: <000601c76c3f$2fab1870$8f014950$@harvard.edu>

Thanks for the replies! Regarding a BLAST SNP search, it uses the blastn or
tblastn interface; put in query at
http://www.ncbi.nlm.nih.gov/SNP/snp_blastByOrg.cgi, choose db, and result
interface is similar to other BLAST components. (sample RID
1174538798-721-146214187572.BLASTQ2 for sample output), except that in the
title of hits it shows an rs# which links to dbSNP. I'd imagine the search
options would be different, as well as a little bit of parsing the output,
but it uses the same engine as blastn.

Regarding the URLAPI then: sorry, I'm pretty new to biopython, but is the
qblast search feature in the current biopython 1.43 build based on commands
to the NCBI URLAPI? If so, then (from someone with moderate experience in
coding but little in python or perl) would I be able to painlessly modify
the biopython NCBIWWW.py code for my own use?

-Zach

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Thursday, March 22, 2007 12:38 AM
To: Michiel de Hoon
Cc: Zachary Zhipeng Sun; biopython at lists.open-bio.org
Subject: Re: [BioPython] BLAST SNP access?

If you are using the NCBI URLAPI interface you can set the databases  
to anything on the following page, just follow the instructions:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/ 
remote_accessible_blastdblist.html

This includes SNP data.  I find this works for BioPerl's RemoteBlast  
(URLAPI-based).

chris

On Mar 21, 2007, at 11:21 PM, Michiel de Hoon wrote:

> Hi Zach,
>
> Can you give us an example of how you currently query the BLAST SNP
> database (without using Biopython)? Just to get an idea of how  
> different
> it is from current BLAST searches with Biopython.
>
> --Michiel.
>
> Zachary Zhipeng Sun wrote:
>> Hello,
>>
>>
>>
>> Thank you for the Bioython tools! They are proving increasingly  
>> useful in my
>> research. I had a question regarding a tool extension, however -  
>> is there
>> any link in biopython to query the BLAST SNP database, or does  
>> anyone know
>> of this being under development? If not, I am not too familiar  
>> with the
>> backend of biopython but I was looking to be able to automate  
>> BLAST SNP
>> searches; does anyone have advice on how to start coding this into  
>> the
>> biopython environment, or to a version of NCBIWWW.py ? Thanks for  
>> your help!
>>
>>
>>
>> Best,
>>
>> Zachary Sun
>>
>>
>>
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mdehoon at c2b2.columbia.edu  Thu Mar 22 14:54:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Thu, 22 Mar 2007 14:54:59 -0400
Subject: [BioPython] BLAST SNP access?
In-Reply-To: <000601c76c3f$2fab1870$8f014950$@harvard.edu>
References: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
	<46020454.2060100@c2b2.columbia.edu>
	<1CB2E066-3A73-429E-80B6-639F26FAD144@uiuc.edu>
	<000601c76c3f$2fab1870$8f014950$@harvard.edu>
Message-ID: <4602D103.20106@c2b2.columbia.edu>

Zachary Zhipeng Sun wrote:
> Thanks for the replies! Regarding a BLAST SNP search, it uses the blastn or
> tblastn interface; put in query at
> http://www.ncbi.nlm.nih.gov/SNP/snp_blastByOrg.cgi, choose db, and result
> interface is similar to other BLAST components. (sample RID
> 1174538798-721-146214187572.BLASTQ2 for sample output), except that in the
> title of hits it shows an rs# which links to dbSNP. I'd imagine the search
> options would be different, as well as a little bit of parsing the output,
> but it uses the same engine as blastn.

It looks like that if you know the name of the database (here 
"snp/human_9606/human_9606"), then you can run for example

from Bio.Blast import NCBIWWW
result_handle = NCBIWWW.qblast("blastn", "snp/human_9606/human_9606", seq)

and then parse the results as usual (see section 3.4 in the Biopython 
tutorial).

Check on the page that Chris sent you:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html

to find the correct name for the database.

The result_handle should give you the same information as the web page, 
and Biopython's parser should parse all information from result_handle 
correctly. If you find that some information seems to be missing, please 
let us know.

Hope this helps,

--Michiel.

-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032

From mauriceling at gmail.com  Thu Mar 22 19:47:14 2007
From: mauriceling at gmail.com (Maurice Ling)
Date: Fri, 23 Mar 2007 10:47:14 +1100
Subject: [BioPython] How to read SOFT files from GEO?
Message-ID: <46031582.7060301@acm.org>

Hi,

Are there any examples to read SOFT file from GEO? I've looked in the 
cookbook and there isn't any mention about GEO.

Thanks in advance,
maurice

From sdavis2 at mail.nih.gov  Thu Mar 22 20:53:01 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 22 Mar 2007 20:53:01 -0400
Subject: [BioPython] How to read SOFT files from GEO?
In-Reply-To: <46031582.7060301@acm.org>
References: <46031582.7060301@acm.org>
Message-ID: <460324ED.8040504@mail.nih.gov>

Maurice Ling wrote:
> Hi,
>
> Are there any examples to read SOFT file from GEO? I've looked in the 
> cookbook and there isn't any mention about GEO.
>   
I don't know of biopython code to do this, but there is a package for 
the R statistical language that will do this.  The nice thing about 
doing this from R is that there are hundreds of tools that can then be 
applied to whatever data you like.  The package is part of Bioconductor 
(http://www.bioconductor.org) and is called GEOquery.  And of course, 
you can use R from python using Rpy.

Sean

P.S.  Legal disclaimer--I am the author of said package, so take my 
words with the appropriate grain of salt.

From mauriceling at gmail.com  Thu Mar 22 21:06:07 2007
From: mauriceling at gmail.com (Maurice Ling)
Date: Fri, 23 Mar 2007 12:06:07 +1100
Subject: [BioPython] How to read SOFT files from GEO?
In-Reply-To: <460324ED.8040504@mail.nih.gov>
References: <46031582.7060301@acm.org> <460324ED.8040504@mail.nih.gov>
Message-ID: <460327FF.5070909@acm.org>

Sean Davis wrote:

> Maurice Ling wrote:
>
>> Hi,
>>
>> Are there any examples to read SOFT file from GEO? I've looked in the 
>> cookbook and there isn't any mention about GEO.
>>   
>
> I don't know of biopython code to do this, but there is a package for 
> the R statistical language that will do this.  The nice thing about 
> doing this from R is that there are hundreds of tools that can then be 
> applied to whatever data you like.  The package is part of 
> Bioconductor (http://www.bioconductor.org) and is called GEOquery.  
> And of course, you can use R from python using Rpy.
>
> Sean
>
> P.S.  Legal disclaimer--I am the author of said package, so take my 
> words with the appropriate grain of salt.
>
In biopython's CVS, there is a subdirectory called Geo. So I thought 
that might be for SOFT files...

ML

P.S. So I know who to ask when I have questions about GEOquery.

From sdavis2 at mail.nih.gov  Thu Mar 22 21:44:42 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 22 Mar 2007 21:44:42 -0400
Subject: [BioPython] How to read SOFT files from GEO?
In-Reply-To: <460327FF.5070909@acm.org>
References: <46031582.7060301@acm.org> <460324ED.8040504@mail.nih.gov>
	<460327FF.5070909@acm.org>
Message-ID: <4603310A.20804@mail.nih.gov>

Maurice Ling wrote:
> Sean Davis wrote:
>
>> Maurice Ling wrote:
>>
>>> Hi,
>>>
>>> Are there any examples to read SOFT file from GEO? I've looked in 
>>> the cookbook and there isn't any mention about GEO.
>>>   
>>
>> I don't know of biopython code to do this, but there is a package for 
>> the R statistical language that will do this.  The nice thing about 
>> doing this from R is that there are hundreds of tools that can then 
>> be applied to whatever data you like.  The package is part of 
>> Bioconductor (http://www.bioconductor.org) and is called GEOquery.  
>> And of course, you can use R from python using Rpy.
>>
>> Sean
>>
>> P.S.  Legal disclaimer--I am the author of said package, so take my 
>> words with the appropriate grain of salt.
>>
> In biopython's CVS, there is a subdirectory called Geo. So I thought 
> that might be for SOFT files...
>
I haven't tried it lately, but last I looked, it was not in-sync with 
GEO (which does infrequently change tags/formats).  Also, I think it 
handles GDS only (if I remember correctly).  Only a subset of GEO is 
available as GDS, since they require hand-curation by GEO staff.

http://portal.open-bio.org/pipermail/biopython-dev/2006-May/002352.html

> P.S. So I know who to ask when I have questions about GEOquery.

Comments and questions are welcome and appreciated.

Sean

From chris.lasher at gmail.com  Sat Mar 24 12:35:04 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sat, 24 Mar 2007 12:35:04 -0400
Subject: [BioPython] Biopython to migrate to Subversion
Message-ID: <128a885f0703240935p5c139736yfe3142bdbc9d6ac4@mail.gmail.com>

Hello Biopythonistas,

The Biopython developers are currently planning a migration from CVS
to Subversion as our revision control system. The target for the
migration is the evening of Sunday May 20, 2007. This change will
mostly impact the developers, however, this may also affect some
users.

If you're a user of Biopython and you...

A) install Biopython from the Windows installer, from packages for
your Linux distribution, or from Fink on OS X...

you will not be affected. You may stop here or read on at your own leisure.

-----

B) download and install from the source in the form of the Tarball or
Zip file...

you will not be affected. You may stop here or read on at your own leisure.

-----

C) retrieve and install Biopython from the CVS repository...

the Biopython devs would really like to hear from you!

For those in category C, the change could mean that you will need a
Subversion client installed on your computer. Clients exists for all
major platforms, including Windows, OS X, and Linux.

Subversion operates through HTTP/HTTPS (ports 80 and 443,
respectively), and specifically uses WebDAV, an extended HTTP
protocol. Though highly unusual, some organizations networks' may
block WebDAV traffic. One way to check whether your organization does
this is to attempt to checkout an existing Subversion repository from,
for instance, a Google Code Project (all of which use Subversion
repositories). For example, you can attempt to check out Kongulo
<http://code.google.com/p/google-kongulo/source>. If you can checkout
an existing repository, you will be ready to migrate to Biopython's
Subversion repository once in place.

If Subversion installation will not be possible, or your network
indeed blocks WebDAV traffic, the Biopython devs need to know. We can
support the CVS repository with read-only access and inject updates
from the Subversion repository into the CVS repository. This involves
a bit more work on the part of devs in setting up and supporting, so
we will take this on only if the necessity exists.

For clarity, Biopython will make a clean move to Subversion and drop
CVS support unless we hear requests for legacy CVS support in advance
of the migration.

-----

If you're a developer of Biopython...
you should be on the Biopython-dev list and have been following this
thread: <http://tinyurl.com/2jd8om>

-----

We will document all things related to Biopython's migration to
Subversion on the Biopython wiki at
<http://biopython.org/wiki/Subversion_migration> for interested
parties.

The developers look forward to a smooth transition and having
Subversion in place to assist us in continually improving Biopython.

Thank you for your time and feedback,
Your friendly neighborhood Biopython developers

From aloraine at gmail.com  Tue Mar 27 01:31:05 2007
From: aloraine at gmail.com (Ann Loraine)
Date: Mon, 26 Mar 2007 23:31:05 -0600
Subject: [BioPython] question regarding writing Seq objects in Fasta format
Message-ID: <83722dde0703262231u15042538q479c2fb0b81a9590@mail.gmail.com>

Hello,

I have a question about how to write out Bio.Seq.Seq objects to a
fasta format file.

I've generated a lot of these by translating segments of genomic
sequence -- see below.

What objects or code should I use?

It looks like Bio.SeqIO.FASTA.FastaWriter might be the right thing,
but it doesn't appear to accept Bio.Seq.Seq objects. Do I need to
create a new type of Seq-like object before I can use
Bio.SeqIO.FASTA.FastaWriter?

Thank you for your time!

Yours,

Ann

xxx my code for generating the Seq objects I want to write

def feat2aaseq(feat,seq):
    """
    Function: translate the given feature
    Returns : a Bio.Seq.Seq [biopython]
    Args    : feat - feature.DNASeqFeature [not biopython]
                seq - a Bio.Seq.Seq, e.g., a chromosome
    """
    start = feat.start()
    end = feat.start()+feat.length()
    fullseq = seq.sequence
    subseq_ob = Seq(fullseq[start:end],IUPAC.unambiguous_dna)
    if feat.strand() == -1:
        subseq_ob = subseq_ob.reverse_complement()
    translator = Translate.unambiguous_dna_by_id[4]
    aaseq = translator.translate(subseq_ob)
    return aaseq

-- 
Ann Loraine, Assistant Professor
Departments of Genetics, Biostatistics,
Computer and Information Sciences
Associate Scientist, Comprehensive Cancer Center
University of Alabama at Birmingham
http://www.transvar.org
205-996-4155

From biopython at maubp.freeserve.co.uk  Tue Mar 27 07:35:54 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 27 Mar 2007 20:35:54 +0900
Subject: [BioPython] question regarding writing Seq objects in Fasta
	format
In-Reply-To: <83722dde0703262231u15042538q479c2fb0b81a9590@mail.gmail.com>
References: <83722dde0703262231u15042538q479c2fb0b81a9590@mail.gmail.com>
Message-ID: <320fb6e00703270435k34390ce2y46937ab07578d7a7@mail.gmail.com>

On 3/27/07, Ann Loraine <aloraine at gmail.com> wrote:
> Hello,
>
> I have a question about how to write out Bio.Seq.Seq objects to a
> fasta format file.
>
> I've generated a lot of these by translating segments of genomic
> sequence -- see below.
>
> What objects or code should I use?

I would suggest you try the new Bio.SeqIO code described here, you
will need BioPython 1.43 or later:

http://biopython.org/wiki/SeqIO

You'll need to "upgrade" your Seq objects into SeqRecord objects and
give them an identifier before calling Bio.SeqIO.write() on them.

We'd welcome feedback on this, for example if there are any errors in
the newly writenupdated documentation.

> It looks like Bio.SeqIO.FASTA.FastaWriter might be the right thing,
> but it doesn't appear to accept Bio.Seq.Seq objects. Do I need to
> create a new type of Seq-like object before I can use
> Bio.SeqIO.FASTA.FastaWriter?

We plan to mark that particular (undocumented) bit of BioPython as
depriated in the next release - so I can't really recommend using it.

Personally, before working on Bio.SeqIO I used to write fasta files
"by hand" using something like this:

handle = open("example.faa", "w")
for identifier, seq in some_list_of_tuples :
    handle.write(">%s\n%s\n" % (identifier, seq.tostring()))
handle.close()

where identifier is a string, and seq is a BioPython Seq object.
There is a lot to be said for doing it "by hand" if you want full
control over the description for example.

Peter

From chris.lasher at gmail.com  Wed Mar 28 13:19:12 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Wed, 28 Mar 2007 13:19:12 -0400
Subject: [BioPython] Biopython to migrate to Subversion
In-Reply-To: <128a885f0703240935p5c139736yfe3142bdbc9d6ac4@mail.gmail.com>
References: <128a885f0703240935p5c139736yfe3142bdbc9d6ac4@mail.gmail.com>
Message-ID: <128a885f0703281019k6c64807dpa4e5d54621c944cc@mail.gmail.com>

I have a correction to make. Again, this only affects those of you
whom obtain your Biopython code via CVS.

On 3/24/07, Chris Lasher <chris.lasher at gmail.com> wrote:
> Subversion operates through HTTP/HTTPS (ports 80 and 443,
> respectively), and specifically uses WebDAV, an extended HTTP
> protocol. Though highly unusual, some organizations networks' may
> block WebDAV traffic. One way to check whether your organization does
> this is to attempt to checkout an existing Subversion repository from,
> for instance, a Google Code Project (all of which use Subversion
> repositories). For example, you can attempt to check out Kongulo
> <http://code.google.com/p/google-kongulo/source>. If you can checkout
> an existing repository, you will be ready to migrate to Biopython's
> Subversion repository once in place.

The Subversion server runs through SSH, *not* through WebDAV, and so
is accessed in the same way that the CVS repository is now. If you can
access the CVS repository now, you can access Subversion repository
once we implement it. Therefore, the only case of need for legacy
support for Subversion is if you cannot get Subversion installed. If
this is the case, please notify me as soon as possible.

Thanks,
Chris

From arareko at campus.iztacala.unam.mx  Sat Mar  3 22:32:46 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 03 Mar 2007 16:32:46 -0600
Subject: [BioPython] [Bioperl-l] New Article on Approaches to Web
 Development for Bioinformatics
In-Reply-To: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com>
References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com>
Message-ID: <45E9F78E.8040406@campus.iztacala.unam.mx>

Hi Alex,

I think you've put a very nice & concise introductory article. I'd like 
to comment a little on some sections I've read:

* Introduction

 > "Given that you have an idea for analyzing or presenting data in a
 > particular was, a complete bioinformatics web application depends of
 > these basic pieces, which is what this article is all about:
 >
 >    1. A source of data...
 >    2. An application programming language...
 >    3. A web application platform...
 >    4. Optionally, a data store...
 >    5. Optionally, you would reuse software tools..."

Even though you do a small mention about Web Services at the very end of 
the article (under Application Integration -> Programmatic Integration), 
I believe that Web Services can be another optional (or even basic) 
piece of a web application. In fact, many web applications consist only 
of Web Services without HTML user interfaces.

* Application Development Languages

 > "There are many different programming platforms and tools available to
 > solve bioinformatics problems.  It can be bewildering at first, but it
 > makes more sense to build on top of some of these tools rather than
 > build from scratch.  Some the problems with using these tools for a
 > bioinformatics portal are
 >
 >    1. Many tools are written...
 >    2. Some tools have particular prerequisites...
 >    3. Many may not be in a form...
 >    4. The context that gives meaning...
 >
 > Standardization on a particular platform can help manageability but
 > for most organizations a compromise between standardization and
 > adoption of several different platforms will allow many people to
 > develop software in platforms that they are already comfortable with
 > and allow the reuse of a large amount of freely available software..."

I would add to the problems list the fact that building web (or other 
kind of) applications on top of a platform whose codebase is evolving 
constantly, can make them very difficult to maintain. The case of 
EnsEMBL comes to my mind here: they opted to stick with BioPerl 1.2.3 as 
a core library and haven't moved onto a higher version of it because the 
EnsEMBL code is so vast, that a simple upgrade of BioPerl would break a 
lot of their code. AFAIK, it's because of this and the slowness at some 
parts of BioPerl that EnsEMBL is gradually saying goodbye to BioPerl.

Also, I think that depending on the amount of available code you plan to 
import into your application, sometimes having a whole platform at the 
very bottom can add unnecessary extra weight to your application. More 
weight could be equal to less speed, this is critical in web development.

* Application Integration -> Navigation

 > "The basic way that users will navigate into and around your
 > application should be using HTTP GET and POST requests with specific
 > URL's. Users bookmark these URL's and other applications will link to
 > them. Most applications developers did not realize it at first, but
 > these URL's are, in fact, an interface into your application that you
 > must maintain in a consistent way as you change and evolve your
 > software. Otherwise, they will find dead links..."

Just as I clicked the bookmark button for your article :) The same 
principle could apply to its filenames. A URL of the form: 
http://medicalcomputing.net/tools_dna17.php is less indicative of the 
real content of the article and can mislead potential readers. 
Optimising the URL's will make them better to be indexed by search 
engines, something like: 
http://medicalcomputing.net/web-development-bioinformatics17.php would 
do the trick.

To conclude my comments, I was surprised to see a section about BioPHP 
and not about other more-known toolkits like BioPython or BioRuby. What 
about their role in web development? Python is also a common language 
for web programming and with all the recent *hot* stuff like Ruby On 
Rails, it's very likely that both Bio* toolkits are more than ready for 
deploying web applications. I'm Cc'ing this to their respective mailing 
lists to see if someone wants to give you some feedback about them in 
order to complement your article. Other than that, I really liked your 
work :)

Cheers,
Mauricio.

Alex Amies wrote:
> I have written an article on Approaches to Web Development for
> Bioinformatics at
> 
> http://medicalcomputing.net/tools_dna1.php
> 
> There is a fairly large section on BioPerl at
> 
> http://medicalcomputing.net/tools_dna13.php
> 
> I hope that someone gets something useful out of it.  I also looking for
> feedback on it and, in particular, please let me know about any mistakes in
> it.
> 
> The intent of the article is to give an overview of various approaches to
> developing web based tools for bioinformatics. It describes the alternatives
> at each layer of the system, including the data layer and sources of data,
> the application programming layer, the web layer, and bioinformatics tools
> and software libraries.
> 
> Alex
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From alexamies at gmail.com  Sun Mar  4 03:09:51 2007
From: alexamies at gmail.com (Alex Amies)
Date: Sat, 3 Mar 2007 19:09:51 -0800
Subject: [BioPython] [Bioperl-l] New Article on Approaches to Web
	Development for Bioinformatics
In-Reply-To: <45E9F78E.8040406@campus.iztacala.unam.mx>
References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com>
	<45E9F78E.8040406@campus.iztacala.unam.mx>
Message-ID: <1ad8057e0703031909v4880f5f1t3c4159b75c36bcca@mail.gmail.com>

Mauricio,

Thanks for your comments.

You are right that I could have said a lot more about web services.  I
plan on doing that but I haven't got there yet.  Actually, with all
the hype about web services I have been surprised to find the
programming model so complicated.

As you mention, I certainly could have thought out my own URL's better.

I have been surprised not to find more PHP activity in bioinformatics.
 To me, besides being a lightweight and pleasant language to program
in it is incredibly economical for hosting Internet applications and
there is a huge open source community around PHP in general.  The same
can be said of Perl.  It is because of my own ignorance and lack of
time that I have not investigated Python and Ruby.  I may do in the
future and write about them.

Alex

On 3/3/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> Hi Alex,
>
> I think you've put a very nice & concise introductory article. I'd like
> to comment a little on some sections I've read:
>
> * Introduction
>
>  > "Given that you have an idea for analyzing or presenting data in a
>  > particular was, a complete bioinformatics web application depends of
>  > these basic pieces, which is what this article is all about:
>  >
>  >    1. A source of data...
>  >    2. An application programming language...
>  >    3. A web application platform...
>  >    4. Optionally, a data store...
>  >    5. Optionally, you would reuse software tools..."
>
> Even though you do a small mention about Web Services at the very end of
> the article (under Application Integration -> Programmatic Integration),
> I believe that Web Services can be another optional (or even basic)
> piece of a web application. In fact, many web applications consist only
> of Web Services without HTML user interfaces.
>
> * Application Development Languages
>
>  > "There are many different programming platforms and tools available to
>  > solve bioinformatics problems.  It can be bewildering at first, but it
>  > makes more sense to build on top of some of these tools rather than
>  > build from scratch.  Some the problems with using these tools for a
>  > bioinformatics portal are
>  >
>  >    1. Many tools are written...
>  >    2. Some tools have particular prerequisites...
>  >    3. Many may not be in a form...
>  >    4. The context that gives meaning...
>  >
>  > Standardization on a particular platform can help manageability but
>  > for most organizations a compromise between standardization and
>  > adoption of several different platforms will allow many people to
>  > develop software in platforms that they are already comfortable with
>  > and allow the reuse of a large amount of freely available software..."
>
> I would add to the problems list the fact that building web (or other
> kind of) applications on top of a platform whose codebase is evolving
> constantly, can make them very difficult to maintain. The case of
> EnsEMBL comes to my mind here: they opted to stick with BioPerl 1.2.3 as
> a core library and haven't moved onto a higher version of it because the
> EnsEMBL code is so vast, that a simple upgrade of BioPerl would break a
> lot of their code. AFAIK, it's because of this and the slowness at some
> parts of BioPerl that EnsEMBL is gradually saying goodbye to BioPerl.
>
> Also, I think that depending on the amount of available code you plan to
> import into your application, sometimes having a whole platform at the
> very bottom can add unnecessary extra weight to your application. More
> weight could be equal to less speed, this is critical in web development.
>
> * Application Integration -> Navigation
>
>  > "The basic way that users will navigate into and around your
>  > application should be using HTTP GET and POST requests with specific
>  > URL's. Users bookmark these URL's and other applications will link to
>  > them. Most applications developers did not realize it at first, but
>  > these URL's are, in fact, an interface into your application that you
>  > must maintain in a consistent way as you change and evolve your
>  > software. Otherwise, they will find dead links..."
>
> Just as I clicked the bookmark button for your article :) The same
> principle could apply to its filenames. A URL of the form:
> http://medicalcomputing.net/tools_dna17.php is less indicative of the
> real content of the article and can mislead potential readers.
> Optimising the URL's will make them better to be indexed by search
> engines, something like:
> http://medicalcomputing.net/web-development-bioinformatics17.php would
> do the trick.
>
> To conclude my comments, I was surprised to see a section about BioPHP
> and not about other more-known toolkits like BioPython or BioRuby. What
> about their role in web development? Python is also a common language
> for web programming and with all the recent *hot* stuff like Ruby On
> Rails, it's very likely that both Bio* toolkits are more than ready for
> deploying web applications. I'm Cc'ing this to their respective mailing
> lists to see if someone wants to give you some feedback about them in
> order to complement your article. Other than that, I really liked your
> work :)
>
> Cheers,
> Mauricio.
>
> Alex Amies wrote:
> > I have written an article on Approaches to Web Development for
> > Bioinformatics at
> >
> > http://medicalcomputing.net/tools_dna1.php
> >
> > There is a fairly large section on BioPerl at
> >
> > http://medicalcomputing.net/tools_dna13.php
> >
> > I hope that someone gets something useful out of it.  I also looking for
> > feedback on it and, in particular, please let me know about any mistakes in
> > it.
> >
> > The intent of the article is to give an overview of various approaches to
> > developing web based tools for bioinformatics. It describes the alternatives
> > at each layer of the system, including the data layer and sources of data,
> > the application programming layer, the web layer, and bioinformatics tools
> > and software libraries.
> >
> > Alex
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
>
>


From shahs at MIT.EDU  Sun Mar  4 04:14:59 2007
From: shahs at MIT.EDU (Hossein Shahsavari)
Date: Sat, 03 Mar 2007 23:14:59 -0500
Subject: [BioPython] IOError: [Errno 2] No such file or directory:
Message-ID: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>

Hello,

I receive the following error when I am trying to access a file called HISTORY
in an another file by this command

template = '~/CSH/HISTORY'

and I get this error.

IOError: [Errno 2] No such file or directory: '~/CSH/HISTORY'

I use python in Linux environment. I appreciate any suggestions/comments.

Hossein Shahsavari


From biopython at maubp.freeserve.co.uk  Sun Mar  4 11:58:07 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 04 Mar 2007 11:58:07 +0000
Subject: [BioPython] IOError: [Errno 2] No such file or directory:
In-Reply-To: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>
References: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>
Message-ID: <45EAB44F.1080909@maubp.freeserve.co.uk>

Hossein Shahsavari wrote:
> Hello,
> 
> I receive the following error when I am trying to access a file called HISTORY
> in an another file by this command
> 
> template = '~/CSH/HISTORY'
> 
> and I get this error.
> 
> IOError: [Errno 2] No such file or directory: '~/CSH/HISTORY'
> 
> I use python in Linux environment. I appreciate any suggestions/comments.

If you have posted the python code it would be easier to guess what is 
going wrong.  What does this do?

import os
template = '~/CSH/HISTORY'
print os.path.isfile(template)

That should print either True or False.

You might also try replacing the tilde ('~') with the actual path of 
your home folder, something like this typically:

template = '/home/username/CSH/HISTORY'

P.S. Have you checked the case?  Linux and Unix are case sensitive.

Peter


From shahs at MIT.EDU  Sun Mar  4 16:57:34 2007
From: shahs at MIT.EDU (Hossein Shahsavari)
Date: Sun, 04 Mar 2007 11:57:34 -0500
Subject: [BioPython] IOError: [Errno 2] No such file or directory:
In-Reply-To: <45EAB44F.1080909@maubp.freeserve.co.uk>
References: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>
	<45EAB44F.1080909@maubp.freeserve.co.uk>
Message-ID: <20070304115734.xe5ms6c7vkv8k4wk@webmail.mit.edu>

Hi

Thanks for your guidances. The problem was the tilde ('~') which I replaced by
the correct path and now it works. I have another maybe simple question:

I have 26 files namely output1, output2,...,output26. I can read them 
one by one
but how can read them all by an easier way like a loop ? I put a "for loop" by
setting

i=1

for i in range(1,27)
template='outputi'

however, I got the same error as above IOError: [Errno 2] No such file or
directory: 'outputi'. It seems "i" can't be attached to the output.

Thanks alot

Hossein


From biopython at maubp.freeserve.co.uk  Sun Mar  4 17:34:26 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 04 Mar 2007 17:34:26 +0000
Subject: [BioPython] IOError: [Errno 2] No such file or directory:
In-Reply-To: <20070304115734.xe5ms6c7vkv8k4wk@webmail.mit.edu>
References: <20070303231459.s0pe4qpb1128o0gw@webmail.mit.edu>	<45EAB44F.1080909@maubp.freeserve.co.uk>
	<20070304115734.xe5ms6c7vkv8k4wk@webmail.mit.edu>
Message-ID: <45EB0322.8050809@maubp.freeserve.co.uk>

Hossein Shahsavari wrote:
 > I have another maybe simple question:
> 
> I have 26 files namely output1, output2,...,output26. I can read them 
> one by one but how can read them all by an easier way like a loop ?
 > I put a "for loop" by setting
> 
> i=1
> 
> for i in range(1,27)
> template='outputi'
> 
> however, I got the same error as above IOError: [Errno 2] No such file or
> directory: 'outputi'. It seems "i" can't be attached to the output.
> 
> Thanks alot
> 
> Hossein

You should really try a basic introduction to python.  There are lots of 
tutorials online, and great books too.  Your questions so far are not 
really related to BioPython at all.

Note that indentation is very important in python.  You were also 
missing the colon at the end for line.  More importantly the following 
line just sets the variable template to the string 'outputi', and 
doesn't do anything with the variable i.

template='outputi'

You want to do something like this:

for i in range(1,27) :
     template = 'output' + str(i)
     print template

Good luck.

Peter


From lucks at fas.harvard.edu  Mon Mar  5 14:13:02 2007
From: lucks at fas.harvard.edu (Julius Lucks)
Date: Mon, 5 Mar 2007 09:13:02 -0500
Subject: [BioPython] blast parsing errors
Message-ID: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>

Hi all,

I am trying to parse a bunch of blast results that I gather via  
NCBIWWW.qblast().  I have the following code snipit:

-----------
from Bio imort Fasta
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
import StringIO
import re

#BLAST cutoff
cutoff = 1e-4

#Create a fasta record: title and seq are given

title = 'test'
seq = 'ATCG'

fasta_rec = Fasta.Record()
	
#Sanitize title - blast does not like single quotes or \n in titles
title = re.sub("'","prime",title)
title = re.sub("\n","",title)
fasta_rec.title = title
fasta_rec.sequence = seq


b_parser = NCBIXML.BlastParser()

result_handle = NCBIWWW.qblast 
('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
z_query="Viruses [ORGN]")
blast_results = result_handle.read()
			
blast_handle = StringIO.StringIO(blast_results)
b_record = b_parser.parse(blast_handle)

for alignment in b_record.alignments:
     titles = alignment.title.split('>')
     print titles

-------------


The issue is sometimes the blast parser chokes with tracebacks like:

   File "./src/create_annotations.py", line 96, in get_blast_annotations
     b_record = b_parser.parse(blast_handle)  File "/sw/lib/python2.5/ 
site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
     self._parser.parse(handler)  File "/sw/lib/python2.5/xml/sax/ 
expatreader.py", line 107, in parse
     xmlreader.IncrementalParser.parse(self, source)  File "/sw/lib/ 
python2.5/xml/sax/xmlreader.py", line 123, in parse
     self.feed(buffer)
   File "/sw/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
     self._err_handler.fatalError(exc)
   File "/sw/lib/python2.5/xml/sax/handler.py", line 38, in  
fatalError    raise exception
   xml.sax._exceptions.SAXParseException: <unknown>:7:70: not well- 
formed (invalid token)

I am not sure which alignment it choked on, but I would like to  
rescue it with a try/except block if possible.  But it seems to me  
that if I did something like

try:
     b_record = b_parser.parse(blast_handle)
except:
     ...

Then I would not get anything in b_record if an error raised in the  
parsing.  Rather, I would like to have whatever has been successful  
up to the point of the error stored in b_record.

Is there any way to do this via the BioPython API, or do I have to  
dig into the python xml parsing code?

Also, if anyone has a better idea of how to structure this code, I  
would be very appreciative.

Cheers,

Julius

-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------


From biopython at maubp.freeserve.co.uk  Mon Mar  5 14:55:38 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 05 Mar 2007 14:55:38 +0000
Subject: [BioPython] blast parsing errors
In-Reply-To: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
Message-ID: <45EC2F6A.6090200@maubp.freeserve.co.uk>

Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:

You didn't say which version of BioPython you are using, I would guess 
1.42 - there have been some Bio.Blast changes since than.

Your example sequence was "ATCG", but you ran a "blastp" search.  Did 
you really mean the peptide Ala-Thr-Cys-Gly here?

If you meant to do a nucleotide search, try using "blastn" and "nr" 
instead.  That should work better.

However, there is still something funny going on.  I tried your example 
as is using the CVS code, and it fails before it even gets the blast 
results back...

Could you save the XML output to a file and email it to me; or even 
better file a bug an attach the XML file to the bug.

Thanks

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar  5 15:12:25 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 05 Mar 2007 15:12:25 +0000
Subject: [BioPython] blast parsing errors
In-Reply-To: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
Message-ID: <45EC3359.1030802@maubp.freeserve.co.uk>

Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:

I am wondering if your trivial example triggered some "unusual" error 
page from the NCBI...

I would suggest you update to CVS, as we have made a lot of changes to 
the Blast XML support.  You would probably be safe just updating the 
following  Bio.Blast files, located here on your machine:

/sw/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIWWW.py
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py
/sw/lib/python2.5/site-packages/Bio/Blast/Record.py

If you don't know how to use CVS, then just backup the originals, and 
replace them with the new files download one by one from here:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython

----------------------------------------------------------------------

This works for me using the CVS version of BioPython.  I have just made 
a string for rather than messing about with a fasta record object to 
keep the code short:

#Protein example, BLASTP
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

#BLAST cutoff
cutoff = 1e-4

fasta_rec = ">GI:121308427\nrslgmevmhernahnfpldlaavevpsing"

b_parser = NCBIXML.BlastParser()
result_handle = NCBIWWW.qblast('blastp', 'nr', fasta_rec, ncbi_gi=1,
                                expect=cutoff, format_type="XML",
                                entrez_query="Viruses [ORGN]")

#This returns a record iterator, changed after release of BioPython 1.42
b_records = b_parser.parse(result_handle)

for b_record in b_records :
     print "%s found %i results" % (b_record.query, 
len(b_record.alignments))
     for alignment in b_record.alignments:
          titles = alignment.title.split('>')
          print titles


Or, if you wanted to do a nucleotide BLASTN search, try:

fasta_rec = '>GI:121308427\nttagccatttatagatggaacttcaacagcagctaagtc' \
           + 'tagagggaaattgtgagcattacgctcgtgcatgacctccataccaagagatct'

and replace 'blastp' with 'blastn' in the call to qblast().

Peter


From mdehoon at c2b2.columbia.edu  Mon Mar  5 15:36:43 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Mon, 05 Mar 2007 10:36:43 -0500
Subject: [BioPython] blast parsing errors
In-Reply-To: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
Message-ID: <45EC390B.8020400@c2b2.columbia.edu>

Julius Lucks wrote:
> seq = 'ATCG'
 > ...
> fasta_rec.sequence = seq
> ...
> result_handle = NCBIWWW.qblast 
> ('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
You have a nucleotide sequence but are running a protein-protein blast 
with blastp. If you run this exact search with Blast through a browser, 
it will show you an error message. The function 
_parse_qblast_ref_page(handle), which is called from NCBIWWW.qblast, 
chokes on this error message. If you want to make this more robust, one 
solution might be to check for error messages returned by the Blast 
server in _parse_qblast_ref_page.

By the way, the code can be simplified as follows:
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

#BLAST cutoff
cutoff = 1e-4

#Create a fasta record: title and seq are given
seq = 'ATCG'

b_parser = NCBIXML.BlastParser()

result_handle = NCBIWWW.qblast('blastn', 'nr', seq, ncbi_gi=1, 
expect=cutoff, format_type="XML", entrez_query="Viruses [ORGN]")
			
b_records = b_parser.parse(result_handle)
b_record = b_records[0]

for alignment in b_record[0].alignments:
      titles = alignment.title.split('>')
      print titles

--------------------------------------------

Note: the BlastParser currently in CVS returns a list of Blast records 
instead of a single Blast record, hence the b_records[0] above.

Btw, with NCBIXML currently in CVS, you don't need to create b_parser first:

result_handle = NCBIWWW.qblast('blastn', 'nr', seq, ncbi_gi=1, 
expect=cutoff, format_type="XML", entrez_query="Viruses [ORGN]")
			
b_records = NCBIXML.parse(result_handle)
b_record = b_records.next()

-----------------------------------------------

--Michiel.


-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032


From winter at biotec.tu-dresden.de  Mon Mar  5 15:07:00 2007
From: winter at biotec.tu-dresden.de (Christof Winter)
Date: Mon, 05 Mar 2007 16:07:00 +0100
Subject: [BioPython] blast parsing errors
In-Reply-To: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
Message-ID: <45EC3214.5050100@biotec.tu-dresden.de>

Running your example, I get:

 >>> ## working on region in file /tmp/python-18415Uda.py...
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "/tmp/python-18415Uda.py", line 25, in ?
     result_handle = 
NCBIWWW.qblast('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entrez_query="Viruses 
[ORGN]")
   File "/var/lib/python-support/python2.4/Bio/Blast/NCBIWWW.py", line 1091, in qblast
     rid, rtoe = _parse_qblast_ref_page(handle)
   File "/var/lib/python-support/python2.4/Bio/Blast/NCBIWWW.py", line 1133, in 
_parse_qblast_ref_page
     return rid, int(rtoe)
ValueError: invalid literal for int(): >
<head>
<title>NCBI Blast</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css">
<link rel="styl
 >>>

I think I'm running the newest 1.44 version of NCBIWWW.py

Cheers,
Christof

Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:
> 
> -----------
> from Bio imort Fasta
> from Bio.Blast import NCBIWWW
> from Bio.Blast import NCBIXML
> import StringIO
> import re
> 
> #BLAST cutoff
> cutoff = 1e-4
> 
> #Create a fasta record: title and seq are given
> 
> title = 'test'
> seq = 'ATCG'
> 
> fasta_rec = Fasta.Record()
> 	
> #Sanitize title - blast does not like single quotes or \n in titles
> title = re.sub("'","prime",title)
> title = re.sub("\n","",title)
> fasta_rec.title = title
> fasta_rec.sequence = seq
> 
> 
> b_parser = NCBIXML.BlastParser()
> 
> result_handle = NCBIWWW.qblast 
> ('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
> z_query="Viruses [ORGN]")
> blast_results = result_handle.read()
> 			
> blast_handle = StringIO.StringIO(blast_results)
> b_record = b_parser.parse(blast_handle)
> 
> for alignment in b_record.alignments:
>      titles = alignment.title.split('>')
>      print titles
> 
> -------------
> 
> 
> The issue is sometimes the blast parser chokes with tracebacks like:
> 
>    File "./src/create_annotations.py", line 96, in get_blast_annotations
>      b_record = b_parser.parse(blast_handle)  File "/sw/lib/python2.5/ 
> site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
>      self._parser.parse(handler)  File "/sw/lib/python2.5/xml/sax/ 
> expatreader.py", line 107, in parse
>      xmlreader.IncrementalParser.parse(self, source)  File "/sw/lib/ 
> python2.5/xml/sax/xmlreader.py", line 123, in parse
>      self.feed(buffer)
>    File "/sw/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
>      self._err_handler.fatalError(exc)
>    File "/sw/lib/python2.5/xml/sax/handler.py", line 38, in  
> fatalError    raise exception
>    xml.sax._exceptions.SAXParseException: <unknown>:7:70: not well- 
> formed (invalid token)
> 
> I am not sure which alignment it choked on, but I would like to  
> rescue it with a try/except block if possible.  But it seems to me  
> that if I did something like
> 
> try:
>      b_record = b_parser.parse(blast_handle)
> except:
>      ...
> 
> Then I would not get anything in b_record if an error raised in the  
> parsing.  Rather, I would like to have whatever has been successful  
> up to the point of the error stored in b_record.
> 
> Is there any way to do this via the BioPython API, or do I have to  
> dig into the python xml parsing code?
> 
> Also, if anyone has a better idea of how to structure this code, I  
> would be very appreciative.
> 
> Cheers,
> 
> Julius
> 
> -----------------------------------------------------
> http://openwetware.org/wiki/User:Lucks
> -----------------------------------------------------
> 
> 
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From lucks at fas.harvard.edu  Mon Mar  5 16:24:58 2007
From: lucks at fas.harvard.edu (Julius Lucks)
Date: Mon, 5 Mar 2007 11:24:58 -0500
Subject: [BioPython] blast parsing errors
In-Reply-To: <45EC2F6A.6090200@maubp.freeserve.co.uk>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
	<45EC2F6A.6090200@maubp.freeserve.co.uk>
Message-ID: <27D93A40-C5AC-4708-BF4C-0ADCFD413B46@fas.harvard.edu>

Thanks guys,

You are right - I am using BioPython 1.42, and python2.5 installed  
via fink on Mac OS X.

I meant to use an amino acid sequence for the seq variable, and I  
have included the revised code snippet which uses the protein  
sequence that gave me trouble in the first place.  However, there is  
no problem when using the current CVS code.  Thanks for all of your  
help!

I have 3 questions:

1.) Is the documentation for the new NCBIXML and NBCIWWW up to date?
2.) Why is NCBIXML.parse returning an iterator in this case since  
there is only one result?  Or in other words, what are the use cases  
where an iterator is necessary?
3.) How are the fink packages of Biopython maintained?  I am using  
the fink unstable tree, which means that I am getting the most  
current version that fink has.  If Biopython 1.44 is substantially  
different from 1.42 (current fink), can we update the fink version  
faster than we currently are?

Cheers,

Julius

---------- code that works in Biopython 1.44 --------

from Bio import Fasta
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
import StringIO
import re

#BLAST cutoff
cutoff = 1e-4

#Create a fasta record: title and seq are given
title = 'test'
seq = '\
MESFSVQAYLKATDNNFVSTFKDAAKQVQNF\
EKNTNSTMSTVGKVATSTGKTLTKAVTVPII\
GIGVAAAKIGGDFESQMSRVKAISGATGSSF\
EELRQQAIDLGAKTAFSAKESASGMENLASA\
GFNAKEIMEAMPGLLDLAAVSGGDVALASEN\
AATALRGFNLDASQSGHVANVFAKAAADTNA\
EVGDMGEAMKYIAPVANSMGLSIEEVSAAIG\
IMSDAGIKGSQAGTSLRGALSRLADPTDAMQ\
AKMDELGLSFYDSEGKMKPLKDQIGMLKDAF\
KGLTPEQQQNALVTLYGQESLSGMMALIDKG\
PDKLGKLTESLKNSDGAADKMAKTMQDNMNS\
SLEQMMGAFESAAIVVQKILAPAVRKVADSI\
SGLVDKFVSAPEPVQKMIVTIGLIVAAIGPL\
LVIFGQAVVTLQRVKVGFLALRSGLALIGGS\
FTAISLPVLGIIAAIAAVIAIGILVYKNWDK\
ISKFGKEVWANVKKFASDAAEVIKEKWGDIT\
QWFSDTWNNIKNGAKGLWDGTVQGAKNAVDS\
VKNAWNGIKEWFTNLWKGTTSGLSSAWDSVT\
TTLAPFVETIKTIFQPILDFFSGLWGQVQTI\
FGSAWEIIKTVVMGPVLLLIDLITGDFNQFK\
KDFAMLWQTLFTNIQTLVTTYVQIVVGFFTA\
WGQTVSNIWTTVVNTIQSLWGAFTTWVINMA\
KSIVDGIVNGWNSFKQGTVDLWNATVQWVKD\
TWASFKQWVVDSANAIVNGVKQGWENLKQGT\
IDLWNGMINGLKGIWDGLKQSVRNLIDNVKT\
TFNNLKNINLLDIGKAIIDGLVKGLKKKWED\
GMKFISGIGDWIRKHKGPIRKDRKLLIPAGK\
AIMTGLNSGLTGGFRNVQSNVSGMGDMIANA\
INSDYSVDIGANVAAANRSISSQVSHDVNLN\
QGKQPASFTVKLGNQIFKAFVDDISNAQGQA\
INLNMGF*'


fasta_rec = Fasta.Record()
	
#Sanitize title - blast does not like single quotes or \n in titles
title = re.sub("'","prime",title)
title = re.sub("\n","",title)
fasta_rec.title = title
fasta_rec.sequence = seq


result_handle = NCBIWWW.qblast  
('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
z_query="Viruses [ORGN]")
b_records = NCBIXML.parse(result_handle)

for b_record in b_records:
     print "%s found %i results" % (b_record.query, len 
(b_record.alignments))
     for alignment in b_record.alignments:
          titles = alignment.title.split('>')
          print titles

----------


-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------


On Mar 5, 2007, at 9:55 AM, Peter wrote:

> Julius Lucks wrote:
>> Hi all,
>> I am trying to parse a bunch of blast results that I gather via   
>> NCBIWWW.qblast().  I have the following code snipit:
>
> You didn't say which version of BioPython you are using, I would  
> guess 1.42 - there have been some Bio.Blast changes since than.
>
> Your example sequence was "ATCG", but you ran a "blastp" search.   
> Did you really mean the peptide Ala-Thr-Cys-Gly here?
>
> If you meant to do a nucleotide search, try using "blastn" and "nr"  
> instead.  That should work better.
>
> However, there is still something funny going on.  I tried your  
> example as is using the CVS code, and it fails before it even gets  
> the blast results back...
>
> Could you save the XML output to a file and email it to me; or even  
> better file a bug an attach the XML file to the bug.
>
> Thanks
>
> Peter


From mdehoon at c2b2.columbia.edu  Mon Mar  5 16:49:53 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Mon, 05 Mar 2007 11:49:53 -0500
Subject: [BioPython] blast parsing errors
In-Reply-To: <27D93A40-C5AC-4708-BF4C-0ADCFD413B46@fas.harvard.edu>
References: <1E0F24AF-A4F4-4818-B7AA-5D35BF7EA260@fas.harvard.edu>
	<45EC2F6A.6090200@maubp.freeserve.co.uk>
	<27D93A40-C5AC-4708-BF4C-0ADCFD413B46@fas.harvard.edu>
Message-ID: <45EC4A31.9010207@c2b2.columbia.edu>

Julius Lucks wrote:
> 1.) Is the documentation for the new NCBIXML and NBCIWWW up to date?

No it is not. To ensure that the documentation on the website agrees 
with the current Biopython release, the idea was to update the 
documentation when the next Biopython release comes out. Originally we 
were planning to make a new Biopython release as soon as the new 
Bio.SeqIO code is done. However, I'd be happy to make a release in the 
immediate future without the new Bio.SeqIO, and make another one once 
Bio.SeqIO is ready.

> 2.) Why is NCBIXML.parse returning an iterator in this case since there 
> is only one result?  Or in other words, what are the use cases where an 
> iterator is necessary?

If you're parsing multiple Blast search results at the same time. In 
other words, if the fasta file for the blast search looked like
 > gene1
ATAGCTACG...
 > gene2
ATCGATCGATGGCA...
 > gene3
....

Such a file can be very large, which is why we are using an iterator 
instead of a list.

Now, one may argue that NCBIXML.parse should return a single record 
instead of an iterator if there's only one result. Others may argue that 
for consistency, it should always return an iterator. Either way is fine 
with me. Anybody have a strong opinion about this?

> 3.) How are the fink packages of Biopython maintained?
I don't know. But, it's not too difficult to install Biopython from the 
source distribution or from CVS. So if you want to be sure you have the 
latest version, you might want to try installing from CVS.

--Michiel.

-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032


From biopython at maubp.freeserve.co.uk  Tue Mar  6 16:27:58 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 06 Mar 2007 16:27:58 +0000
Subject: [BioPython] Bio.Kabat and the Kabat Database
Message-ID: <45ED968E.1010102@maubp.freeserve.co.uk>

I've been looking though the modules in BioPython, and had a closer look
at Bio.Kabat written by Katharine Lindner in 2001 to parse files from
the Kabat database of proteins of immunological interest:

http://www.kabatdatabase.com/

Quoting the website,

>  01 September 2006
 >
> Interested parties may purchase the Database, in ASCII text
> structured flat files, as well as an SQL relationship database (not
> previously available), for $2250 US.
> 
> This one-time license fee is unrestricted, except for distribution.
> 
> Analysis Tools
> 
> The searching and analysis tools are additionally available.
> Included are generalized lookup, aligned sequence searching light
> chain alignment, length distribution, positional correlation,
> variability, and much more.  Please contact for quote.

Does anyone use the Bio.Kabat code?  Could we (or should we) mark it as 
depreciated for the next release of BioPython?

Peter


From snakepit.rattlesnakes at gmail.com  Mon Mar 12 09:59:25 2007
From: snakepit.rattlesnakes at gmail.com (Joydeep Mitra)
Date: Mon, 12 Mar 2007 15:29:25 +0530
Subject: [BioPython] Retrieving the raw sequence from sequence object...
Message-ID: <972566ff0703120259k3979c223r2172f631d48fa6fd@mail.gmail.com>

Hi,
I'm a student of bioinformatics (coming from a biological background).

I've just started using biopython for parsing biological file formats.
The Bio.Fasta module contains the fasta iterator object, which spits out
sequence objects...of the form:

 Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGATCACATAATAAT ...',
IUPACAmbiguousDNA())

I want to retrieve the sequence in it's entirety and in raw format....how
does one do that using an instance object?
I've tried a few things without success...will be glad if some1 could show
me how...

Thanking in advance,

Joy


From sdavis2 at mail.nih.gov  Mon Mar 12 10:20:11 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 12 Mar 2007 06:20:11 -0400
Subject: [BioPython] Retrieving the raw sequence from sequence object...
In-Reply-To: <972566ff0703120259k3979c223r2172f631d48fa6fd@mail.gmail.com>
References: <972566ff0703120259k3979c223r2172f631d48fa6fd@mail.gmail.com>
Message-ID: <200703120620.11871.sdavis2@mail.nih.gov>

On Monday 12 March 2007 05:59, Joydeep Mitra wrote:
> Hi,
> I'm a student of bioinformatics (coming from a biological background).
>
> I've just started using biopython for parsing biological file formats.
> The Bio.Fasta module contains the fasta iterator object, which spits out
> sequence objects...of the form:
>
>  Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGATCACATAATAAT ...',
> IUPACAmbiguousDNA())
>
> I want to retrieve the sequence in it's entirety and in raw format....how
> does one do that using an instance object?
> I've tried a few things without success...will be glad if some1 could show
> me how...

If you have a sequence object, "myseq":

myseq.tostring()

See here for more details:

http://biopython.org/DIST/docs/tutorial/Tutorial.html

Section 2.2.

Hope that helps.

Sean


From lucks at fas.harvard.edu  Wed Mar 14 22:16:32 2007
From: lucks at fas.harvard.edu (Julius Lucks)
Date: Wed, 14 Mar 2007 18:16:32 -0400
Subject: [BioPython] Biopython hackathon
Message-ID: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>

Hi all,

I was just chatting with Jason Stajich about the openbio hackathon  
that took place a few years ago.  What biopython projects do people  
think would be appropriate for another hackathon in the near future?   
Are there things that have been on the TODO list for a while?  New  
functionality that would benefit from a bunch of us getting together  
in one place (possibly with other openbio projects)?

Cheers,

Julius

-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------


From aloraine at gmail.com  Wed Mar 14 23:43:23 2007
From: aloraine at gmail.com (Ann Loraine)
Date: Wed, 14 Mar 2007 17:43:23 -0600
Subject: [BioPython] Biopython hackathon
In-Reply-To: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>
References: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>
Message-ID: <83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com>

I hope you will consider the following two requests as possible
hackathon activities:

(1) If it does not already do this, it would be nice if the blast
"plain text" (non-XML) parser would report the length of the target
("hit") sequence as well as the query. If I recall correctly, the last
time I used the plain text blast parser, I had to measure the length
of the targets by opening up the fasta copy of the blastable database
and reading the lengths one-by-one. My database wasn't very big, so it
wasn't a hassle to do this, but I can foresee situations where this
kludge would fail.

(2) Another request is for a Python interface to the Bio::Db::SQL
database schema described in:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12368253.

The part that seems particularly valuable would be code that to
construct the location-based queries.

My apologies if this already exists -- if yes, please let me know
where I can find it!!

All the best,

Ann Loraine


On 3/14/07, Julius Lucks <lucks at fas.harvard.edu> wrote:
> Hi all,
>
> I was just chatting with Jason Stajich about the openbio hackathon
> that took place a few years ago.  What biopython projects do people
> think would be appropriate for another hackathon in the near future?
> Are there things that have been on the TODO list for a while?  New
> functionality that would benefit from a bunch of us getting together
> in one place (possibly with other openbio projects)?
>
> Cheers,
>
> Julius
>
> -----------------------------------------------------
> http://openwetware.org/wiki/User:Lucks
> -----------------------------------------------------
>
>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


From cjfields at uiuc.edu  Thu Mar 15 00:29:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Mar 2007 19:29:43 -0500
Subject: [BioPython] Biopython hackathon
In-Reply-To: <83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com>
References: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>
	<83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com>
Message-ID: <8FC21D46-FEC1-43FF-B770-BC4D05E569D5@uiuc.edu>


On Mar 14, 2007, at 6:43 PM, Ann Loraine wrote:

> ...
> (2) Another request is for a Python interface to the Bio::Db::SQL
> database schema described in:
>
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? 
> cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12368253.
>
> The part that seems particularly valuable would be code that to
> construct the location-based queries.

Did you mean Bio::DB::GFF?  AFAIK Lincoln is moving stuff over to a  
newer system that better facilitates GFF3, namely  
Bio::DB::SeqFeature.  I'm not sure how well Bio::DB::GFF is supported  
currently.

> My apologies if this already exists -- if yes, please let me know
> where I can find it!!

You can find out about the current state of GBrowse affairs by  
emailing the GBrowse mail list.  Here's the sourceforge link:

http://sourceforge.net/mailarchive/forum.php?forum_id=31947

Scott and Lincoln can indicate where their current focus is re: GFF3  
and sequence feature database development.

chris

> All the best,
>
> Ann Loraine


Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From aloraine at gmail.com  Thu Mar 15 06:52:13 2007
From: aloraine at gmail.com (Ann Loraine)
Date: Thu, 15 Mar 2007 00:52:13 -0600
Subject: [BioPython] Biopython hackathon
In-Reply-To: <8FC21D46-FEC1-43FF-B770-BC4D05E569D5@uiuc.edu>
References: <21C70692-33D8-457E-AB5B-D4701E2704FB@fas.harvard.edu>
	<83722dde0703141643q52c04a03l4d2b28926e8aff3a@mail.gmail.com>
	<8FC21D46-FEC1-43FF-B770-BC4D05E569D5@uiuc.edu>
Message-ID: <83722dde0703142352y325698bag5d43c4a771bb1e5@mail.gmail.com>

Thanks, I meant Bio::DB::GFF -- the schema shown in the paper.

A simple schema that represents features on genomic sequence and
easily supports fast region-based queries is what I'm after. The
indexing scheme in the paper looked good, so I was hoping to find
python code that would hide the details of formulating the SQL. My
main goal is speed -- running the queries and then outputting data in
GFF, bed, or DAS XML, as the need arises.

-Ann

On 3/14/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On Mar 14, 2007, at 6:43 PM, Ann Loraine wrote:
>
> > ...
> > (2) Another request is for a Python interface to the Bio::Db::SQL
> > database schema described in:
> >
> > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
> > cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12368253.
> >
> > The part that seems particularly valuable would be code that to
> > construct the location-based queries.
>
> Did you mean Bio::DB::GFF?  AFAIK Lincoln is moving stuff over to a
> newer system that better facilitates GFF3, namely
> Bio::DB::SeqFeature.  I'm not sure how well Bio::DB::GFF is supported
> currently.
>
> > My apologies if this already exists -- if yes, please let me know
> > where I can find it!!
>
> You can find out about the current state of GBrowse affairs by
> emailing the GBrowse mail list.  Here's the sourceforge link:
>
> http://sourceforge.net/mailarchive/forum.php?forum_id=31947
>
> Scott and Lincoln can indicate where their current focus is re: GFF3
> and sequence feature database development.
>
> chris
>
> > All the best,
> >
> > Ann Loraine
>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


-- 
Ann Loraine, Assistant Professor
Departments of Genetics, Biostatistics,
Computer and Information Sciences
Associate Scientist, Comprehensive Cancer Center
University of Alabama at Birmingham
http://www.transvar.org
205-996-4155


From mdehoon at c2b2.columbia.edu  Fri Mar 16 19:09:10 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Fri, 16 Mar 2007 15:09:10 -0400
Subject: [BioPython] Bio.Kabat and the Kabat Database
In-Reply-To: <45ED968E.1010102@maubp.freeserve.co.uk>
References: <45ED968E.1010102@maubp.freeserve.co.uk>
Message-ID: <45FAEB56.9080509@c2b2.columbia.edu>

Peter wrote:
> Does anyone use the Bio.Kabat code?  Could we (or should we) mark it as
> depreciated for the next release of BioPython?

Since no users of Bio.Kabat came forward, I've marked it as deprecated 
for the upcoming release. This only means that importing Bio.Kabat will 
show a warning message, so the code is still usable. If still no users 
come forward, we can remove Bio.Kabat in a later release.

--Michiel.

-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032


From mdehoon at c2b2.columbia.edu  Sat Mar 17 23:26:50 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Sat, 17 Mar 2007 19:26:50 -0400
Subject: [BioPython] Biopython release 1.43
Message-ID: <45FC793A.2010106@c2b2.columbia.edu>

Dear Biopythoneers,

We are pleased to announce the release of Biopython 1.43.

This release includes a brand-new set of parsers in Bio.SeqIO by Peter 
Cock for reading biological sequence files in various formats, an 
updated Blast XML parser in Bio.Blast.NCBIXML, a new UniGene flat-file 
parser by  Sean Davis, and numerous improvements and bug fixes in 
Bio.PDB, Bio.SwissProt, Bio.Nexus, BioSQL, and others. Believe it or 
not, even the documentation was updated.

Source distributions and Windows installers are available from the 
Biopython website at http://biopython.org. My thanks to all code 
contributers who made this new release possible.

--Michiel on behalf of the Biopython developers


-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032


From rohini.damle at gmail.com  Mon Mar 19 21:08:36 2007
From: rohini.damle at gmail.com (Rohini Damle)
Date: Mon, 19 Mar 2007 13:08:36 -0800
Subject: [BioPython] pdf articles from medline
Message-ID: <d9fd76050703191408l1197b662o7542b547d4fa1664@mail.gmail.com>

Hi,
Does anyone know if there is any provision in Biopython to download
PDF articles from Medline, if we have a list of pubmed ids?
Thank you for your help.
-Rohini.


From sdavis2 at mail.nih.gov  Mon Mar 19 22:37:23 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 19 Mar 2007 18:37:23 -0400
Subject: [BioPython] pdf articles from medline
In-Reply-To: <d9fd76050703191408l1197b662o7542b547d4fa1664@mail.gmail.com>
References: <d9fd76050703191408l1197b662o7542b547d4fa1664@mail.gmail.com>
Message-ID: <45FF10A3.7090708@mail.nih.gov>

Rohini Damle wrote:
> Hi,
> Does anyone know if there is any provision in Biopython to download
> PDF articles from Medline, if we have a list of pubmed ids?
>   
Medline does not store the PDFs, in general, so I don't think that is 
possible.  You could certainly scrape the HTML for links and then follow 
them, looking for a PDF link, but there isn't a general solution for all 
journals, etc. 

Sean


From jingzhou2005 at gmail.com  Wed Mar 21 15:49:20 2007
From: jingzhou2005 at gmail.com (Jing Zhou)
Date: Wed, 21 Mar 2007 11:49:20 -0400
Subject: [BioPython] can PDBParser work on a pdb file without residue
	sequence information?
Message-ID: <4b036ac00703210849n43b41ff8hd1f52784452f1d3c@mail.gmail.com>

I want to parse a pseudo pdb file I generated for a bunch of points in
space. There is no physical meaning to assign residue number or chain number
to it. Here is the example of one line:
ATOM      1  C                  -7.083  -6.182  50.181  1.00
(the atom type has no physical meaning either)
I just want to be able to display the positions of these points in viewer.
What I want to do is to use biopython to parse this pseudo pdb file and
display it in vtk. (I am already able to see the points in pymol using this
pseudo pdb file)

Here is the function I tried to use to parse this pseudo pdb file:
from Bio.PDB import PDBParser
parser = PDBParser()
structure = parser.get_structure('mypdb',mypseudopdbfile)

Here is the error message:
    structure = parser.get_structure('mypdb',mypseudopdbfile)
  File "C:\Python24\lib\site-packages\Bio\PDB\PDBParser.py", line 66, in
get_structure
    self._parse(file.readlines())
  File "C:\Python24\lib\site-packages\Bio\PDB\PDBParser.py", line 87, in
_parse
    self.trailer=self._parse_coordinates(coords_trailer)
  File "C:\Python24\lib\site-packages\Bio\PDB\PDBParser.py", line 144, in
_parse_coordinates
    resseq=int(split(line[22:26])[0])   # sequence identifier
IndexError: list index out of range

It seems that the error happens to seek residue sequence information at
column 22-26. Sure, I can try to add fake resseq info. but my question is
whether there is another way around to neglect the index of residue sequence
number? I just want to directly get the array of coordinates and display
them.

Thanks

Jing


From zsun at fas.harvard.edu  Thu Mar 22 03:41:22 2007
From: zsun at fas.harvard.edu (Zachary Zhipeng Sun)
Date: Wed, 21 Mar 2007 23:41:22 -0400
Subject: [BioPython] BLAST SNP access?
Message-ID: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>

Hello,

 
Thank you for the Bioython tools! They are proving increasingly useful in my
research. I had a question regarding a tool extension, however - is there
any link in biopython to query the BLAST SNP database, or does anyone know
of this being under development? If not, I am not too familiar with the
backend of biopython but I was looking to be able to automate BLAST SNP
searches; does anyone have advice on how to start coding this into the
biopython environment, or to a version of NCBIWWW.py ? Thanks for your help!

 
Best,

Zachary Sun

 
From mdehoon at c2b2.columbia.edu  Thu Mar 22 04:21:40 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Thu, 22 Mar 2007 00:21:40 -0400
Subject: [BioPython] BLAST SNP access?
In-Reply-To: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
References: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
Message-ID: <46020454.2060100@c2b2.columbia.edu>

Hi Zach,

Can you give us an example of how you currently query the BLAST SNP 
database (without using Biopython)? Just to get an idea of how different 
it is from current BLAST searches with Biopython.

--Michiel.

Zachary Zhipeng Sun wrote:
> Hello,
> 
>  
> 
> Thank you for the Bioython tools! They are proving increasingly useful in my
> research. I had a question regarding a tool extension, however - is there
> any link in biopython to query the BLAST SNP database, or does anyone know
> of this being under development? If not, I am not too familiar with the
> backend of biopython but I was looking to be able to automate BLAST SNP
> searches; does anyone have advice on how to start coding this into the
> biopython environment, or to a version of NCBIWWW.py ? Thanks for your help!
> 
>  
> 
> Best,
> 
> Zachary Sun
> 
>  
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From cjfields at uiuc.edu  Thu Mar 22 04:38:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Mar 2007 23:38:11 -0500
Subject: [BioPython] BLAST SNP access?
In-Reply-To: <46020454.2060100@c2b2.columbia.edu>
References: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
	<46020454.2060100@c2b2.columbia.edu>
Message-ID: <1CB2E066-3A73-429E-80B6-639F26FAD144@uiuc.edu>

If you are using the NCBI URLAPI interface you can set the databases  
to anything on the following page, just follow the instructions:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/ 
remote_accessible_blastdblist.html

This includes SNP data.  I find this works for BioPerl's RemoteBlast  
(URLAPI-based).

chris

On Mar 21, 2007, at 11:21 PM, Michiel de Hoon wrote:

> Hi Zach,
>
> Can you give us an example of how you currently query the BLAST SNP
> database (without using Biopython)? Just to get an idea of how  
> different
> it is from current BLAST searches with Biopython.
>
> --Michiel.
>
> Zachary Zhipeng Sun wrote:
>> Hello,
>>
>>
>>
>> Thank you for the Bioython tools! They are proving increasingly  
>> useful in my
>> research. I had a question regarding a tool extension, however -  
>> is there
>> any link in biopython to query the BLAST SNP database, or does  
>> anyone know
>> of this being under development? If not, I am not too familiar  
>> with the
>> backend of biopython but I was looking to be able to automate  
>> BLAST SNP
>> searches; does anyone have advice on how to start coding this into  
>> the
>> biopython environment, or to a version of NCBIWWW.py ? Thanks for  
>> your help!
>>
>>
>>
>> Best,
>>
>> Zachary Sun
>>
>>
>>
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From zsun at fas.harvard.edu  Thu Mar 22 05:01:44 2007
From: zsun at fas.harvard.edu (Zachary Zhipeng Sun)
Date: Thu, 22 Mar 2007 01:01:44 -0400
Subject: [BioPython] BLAST SNP access?
In-Reply-To: <1CB2E066-3A73-429E-80B6-639F26FAD144@uiuc.edu>
References: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
	<46020454.2060100@c2b2.columbia.edu>
	<1CB2E066-3A73-429E-80B6-639F26FAD144@uiuc.edu>
Message-ID: <000601c76c3f$2fab1870$8f014950$@harvard.edu>

Thanks for the replies! Regarding a BLAST SNP search, it uses the blastn or
tblastn interface; put in query at
http://www.ncbi.nlm.nih.gov/SNP/snp_blastByOrg.cgi, choose db, and result
interface is similar to other BLAST components. (sample RID
1174538798-721-146214187572.BLASTQ2 for sample output), except that in the
title of hits it shows an rs# which links to dbSNP. I'd imagine the search
options would be different, as well as a little bit of parsing the output,
but it uses the same engine as blastn.

Regarding the URLAPI then: sorry, I'm pretty new to biopython, but is the
qblast search feature in the current biopython 1.43 build based on commands
to the NCBI URLAPI? If so, then (from someone with moderate experience in
coding but little in python or perl) would I be able to painlessly modify
the biopython NCBIWWW.py code for my own use?

-Zach

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Thursday, March 22, 2007 12:38 AM
To: Michiel de Hoon
Cc: Zachary Zhipeng Sun; biopython at lists.open-bio.org
Subject: Re: [BioPython] BLAST SNP access?

If you are using the NCBI URLAPI interface you can set the databases  
to anything on the following page, just follow the instructions:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/ 
remote_accessible_blastdblist.html

This includes SNP data.  I find this works for BioPerl's RemoteBlast  
(URLAPI-based).

chris

On Mar 21, 2007, at 11:21 PM, Michiel de Hoon wrote:

> Hi Zach,
>
> Can you give us an example of how you currently query the BLAST SNP
> database (without using Biopython)? Just to get an idea of how  
> different
> it is from current BLAST searches with Biopython.
>
> --Michiel.
>
> Zachary Zhipeng Sun wrote:
>> Hello,
>>
>>
>>
>> Thank you for the Bioython tools! They are proving increasingly  
>> useful in my
>> research. I had a question regarding a tool extension, however -  
>> is there
>> any link in biopython to query the BLAST SNP database, or does  
>> anyone know
>> of this being under development? If not, I am not too familiar  
>> with the
>> backend of biopython but I was looking to be able to automate  
>> BLAST SNP
>> searches; does anyone have advice on how to start coding this into  
>> the
>> biopython environment, or to a version of NCBIWWW.py ? Thanks for  
>> your help!
>>
>>
>>
>> Best,
>>
>> Zachary Sun
>>
>>
>>
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mdehoon at c2b2.columbia.edu  Thu Mar 22 18:54:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Thu, 22 Mar 2007 14:54:59 -0400
Subject: [BioPython] BLAST SNP access?
In-Reply-To: <000601c76c3f$2fab1870$8f014950$@harvard.edu>
References: <000c01c76c33$f98c93a0$eca5bae0$@harvard.edu>
	<46020454.2060100@c2b2.columbia.edu>
	<1CB2E066-3A73-429E-80B6-639F26FAD144@uiuc.edu>
	<000601c76c3f$2fab1870$8f014950$@harvard.edu>
Message-ID: <4602D103.20106@c2b2.columbia.edu>

Zachary Zhipeng Sun wrote:
> Thanks for the replies! Regarding a BLAST SNP search, it uses the blastn or
> tblastn interface; put in query at
> http://www.ncbi.nlm.nih.gov/SNP/snp_blastByOrg.cgi, choose db, and result
> interface is similar to other BLAST components. (sample RID
> 1174538798-721-146214187572.BLASTQ2 for sample output), except that in the
> title of hits it shows an rs# which links to dbSNP. I'd imagine the search
> options would be different, as well as a little bit of parsing the output,
> but it uses the same engine as blastn.

It looks like that if you know the name of the database (here 
"snp/human_9606/human_9606"), then you can run for example

from Bio.Blast import NCBIWWW
result_handle = NCBIWWW.qblast("blastn", "snp/human_9606/human_9606", seq)

and then parse the results as usual (see section 3.4 in the Biopython 
tutorial).

Check on the page that Chris sent you:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html

to find the correct name for the database.

The result_handle should give you the same information as the web page, 
and Biopython's parser should parse all information from result_handle 
correctly. If you find that some information seems to be missing, please 
let us know.

Hope this helps,

--Michiel.

-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032


From mauriceling at gmail.com  Thu Mar 22 23:47:14 2007
From: mauriceling at gmail.com (Maurice Ling)
Date: Fri, 23 Mar 2007 10:47:14 +1100
Subject: [BioPython] How to read SOFT files from GEO?
Message-ID: <46031582.7060301@acm.org>

Hi,

Are there any examples to read SOFT file from GEO? I've looked in the 
cookbook and there isn't any mention about GEO.

Thanks in advance,
maurice


From sdavis2 at mail.nih.gov  Fri Mar 23 00:53:01 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 22 Mar 2007 20:53:01 -0400
Subject: [BioPython] How to read SOFT files from GEO?
In-Reply-To: <46031582.7060301@acm.org>
References: <46031582.7060301@acm.org>
Message-ID: <460324ED.8040504@mail.nih.gov>

Maurice Ling wrote:
> Hi,
>
> Are there any examples to read SOFT file from GEO? I've looked in the 
> cookbook and there isn't any mention about GEO.
>   
I don't know of biopython code to do this, but there is a package for 
the R statistical language that will do this.  The nice thing about 
doing this from R is that there are hundreds of tools that can then be 
applied to whatever data you like.  The package is part of Bioconductor 
(http://www.bioconductor.org) and is called GEOquery.  And of course, 
you can use R from python using Rpy.

Sean

P.S.  Legal disclaimer--I am the author of said package, so take my 
words with the appropriate grain of salt.


From mauriceling at gmail.com  Fri Mar 23 01:06:07 2007
From: mauriceling at gmail.com (Maurice Ling)
Date: Fri, 23 Mar 2007 12:06:07 +1100
Subject: [BioPython] How to read SOFT files from GEO?
In-Reply-To: <460324ED.8040504@mail.nih.gov>
References: <46031582.7060301@acm.org> <460324ED.8040504@mail.nih.gov>
Message-ID: <460327FF.5070909@acm.org>

Sean Davis wrote:

> Maurice Ling wrote:
>
>> Hi,
>>
>> Are there any examples to read SOFT file from GEO? I've looked in the 
>> cookbook and there isn't any mention about GEO.
>>   
>
> I don't know of biopython code to do this, but there is a package for 
> the R statistical language that will do this.  The nice thing about 
> doing this from R is that there are hundreds of tools that can then be 
> applied to whatever data you like.  The package is part of 
> Bioconductor (http://www.bioconductor.org) and is called GEOquery.  
> And of course, you can use R from python using Rpy.
>
> Sean
>
> P.S.  Legal disclaimer--I am the author of said package, so take my 
> words with the appropriate grain of salt.
>
In biopython's CVS, there is a subdirectory called Geo. So I thought 
that might be for SOFT files...

ML

P.S. So I know who to ask when I have questions about GEOquery.


From sdavis2 at mail.nih.gov  Fri Mar 23 01:44:42 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 22 Mar 2007 21:44:42 -0400
Subject: [BioPython] How to read SOFT files from GEO?
In-Reply-To: <460327FF.5070909@acm.org>
References: <46031582.7060301@acm.org> <460324ED.8040504@mail.nih.gov>
	<460327FF.5070909@acm.org>
Message-ID: <4603310A.20804@mail.nih.gov>

Maurice Ling wrote:
> Sean Davis wrote:
>
>> Maurice Ling wrote:
>>
>>> Hi,
>>>
>>> Are there any examples to read SOFT file from GEO? I've looked in 
>>> the cookbook and there isn't any mention about GEO.
>>>   
>>
>> I don't know of biopython code to do this, but there is a package for 
>> the R statistical language that will do this.  The nice thing about 
>> doing this from R is that there are hundreds of tools that can then 
>> be applied to whatever data you like.  The package is part of 
>> Bioconductor (http://www.bioconductor.org) and is called GEOquery.  
>> And of course, you can use R from python using Rpy.
>>
>> Sean
>>
>> P.S.  Legal disclaimer--I am the author of said package, so take my 
>> words with the appropriate grain of salt.
>>
> In biopython's CVS, there is a subdirectory called Geo. So I thought 
> that might be for SOFT files...
>
I haven't tried it lately, but last I looked, it was not in-sync with 
GEO (which does infrequently change tags/formats).  Also, I think it 
handles GDS only (if I remember correctly).  Only a subset of GEO is 
available as GDS, since they require hand-curation by GEO staff.

http://portal.open-bio.org/pipermail/biopython-dev/2006-May/002352.html

> P.S. So I know who to ask when I have questions about GEOquery.

Comments and questions are welcome and appreciated.

Sean


From chris.lasher at gmail.com  Sat Mar 24 16:35:04 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sat, 24 Mar 2007 12:35:04 -0400
Subject: [BioPython] Biopython to migrate to Subversion
Message-ID: <128a885f0703240935p5c139736yfe3142bdbc9d6ac4@mail.gmail.com>

Hello Biopythonistas,

The Biopython developers are currently planning a migration from CVS
to Subversion as our revision control system. The target for the
migration is the evening of Sunday May 20, 2007. This change will
mostly impact the developers, however, this may also affect some
users.

If you're a user of Biopython and you...

A) install Biopython from the Windows installer, from packages for
your Linux distribution, or from Fink on OS X...

you will not be affected. You may stop here or read on at your own leisure.

-----

B) download and install from the source in the form of the Tarball or
Zip file...

you will not be affected. You may stop here or read on at your own leisure.

-----

C) retrieve and install Biopython from the CVS repository...

the Biopython devs would really like to hear from you!

For those in category C, the change could mean that you will need a
Subversion client installed on your computer. Clients exists for all
major platforms, including Windows, OS X, and Linux.

Subversion operates through HTTP/HTTPS (ports 80 and 443,
respectively), and specifically uses WebDAV, an extended HTTP
protocol. Though highly unusual, some organizations networks' may
block WebDAV traffic. One way to check whether your organization does
this is to attempt to checkout an existing Subversion repository from,
for instance, a Google Code Project (all of which use Subversion
repositories). For example, you can attempt to check out Kongulo
<http://code.google.com/p/google-kongulo/source>. If you can checkout
an existing repository, you will be ready to migrate to Biopython's
Subversion repository once in place.

If Subversion installation will not be possible, or your network
indeed blocks WebDAV traffic, the Biopython devs need to know. We can
support the CVS repository with read-only access and inject updates
from the Subversion repository into the CVS repository. This involves
a bit more work on the part of devs in setting up and supporting, so
we will take this on only if the necessity exists.

For clarity, Biopython will make a clean move to Subversion and drop
CVS support unless we hear requests for legacy CVS support in advance
of the migration.

-----

If you're a developer of Biopython...
you should be on the Biopython-dev list and have been following this
thread: <http://tinyurl.com/2jd8om>

-----

We will document all things related to Biopython's migration to
Subversion on the Biopython wiki at
<http://biopython.org/wiki/Subversion_migration> for interested
parties.

The developers look forward to a smooth transition and having
Subversion in place to assist us in continually improving Biopython.

Thank you for your time and feedback,
Your friendly neighborhood Biopython developers


From aloraine at gmail.com  Tue Mar 27 05:31:05 2007
From: aloraine at gmail.com (Ann Loraine)
Date: Mon, 26 Mar 2007 23:31:05 -0600
Subject: [BioPython] question regarding writing Seq objects in Fasta format
Message-ID: <83722dde0703262231u15042538q479c2fb0b81a9590@mail.gmail.com>

Hello,

I have a question about how to write out Bio.Seq.Seq objects to a
fasta format file.

I've generated a lot of these by translating segments of genomic
sequence -- see below.

What objects or code should I use?

It looks like Bio.SeqIO.FASTA.FastaWriter might be the right thing,
but it doesn't appear to accept Bio.Seq.Seq objects. Do I need to
create a new type of Seq-like object before I can use
Bio.SeqIO.FASTA.FastaWriter?

Thank you for your time!

Yours,

Ann

xxx my code for generating the Seq objects I want to write

def feat2aaseq(feat,seq):
    """
    Function: translate the given feature
    Returns : a Bio.Seq.Seq [biopython]
    Args    : feat - feature.DNASeqFeature [not biopython]
                seq - a Bio.Seq.Seq, e.g., a chromosome
    """
    start = feat.start()
    end = feat.start()+feat.length()
    fullseq = seq.sequence
    subseq_ob = Seq(fullseq[start:end],IUPAC.unambiguous_dna)
    if feat.strand() == -1:
        subseq_ob = subseq_ob.reverse_complement()
    translator = Translate.unambiguous_dna_by_id[4]
    aaseq = translator.translate(subseq_ob)
    return aaseq

-- 
Ann Loraine, Assistant Professor
Departments of Genetics, Biostatistics,
Computer and Information Sciences
Associate Scientist, Comprehensive Cancer Center
University of Alabama at Birmingham
http://www.transvar.org
205-996-4155


From biopython at maubp.freeserve.co.uk  Tue Mar 27 11:35:54 2007
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 27 Mar 2007 20:35:54 +0900
Subject: [BioPython] question regarding writing Seq objects in Fasta
	format
In-Reply-To: <83722dde0703262231u15042538q479c2fb0b81a9590@mail.gmail.com>
References: <83722dde0703262231u15042538q479c2fb0b81a9590@mail.gmail.com>
Message-ID: <320fb6e00703270435k34390ce2y46937ab07578d7a7@mail.gmail.com>

On 3/27/07, Ann Loraine <aloraine at gmail.com> wrote:
> Hello,
>
> I have a question about how to write out Bio.Seq.Seq objects to a
> fasta format file.
>
> I've generated a lot of these by translating segments of genomic
> sequence -- see below.
>
> What objects or code should I use?

I would suggest you try the new Bio.SeqIO code described here, you
will need BioPython 1.43 or later:

http://biopython.org/wiki/SeqIO

You'll need to "upgrade" your Seq objects into SeqRecord objects and
give them an identifier before calling Bio.SeqIO.write() on them.

We'd welcome feedback on this, for example if there are any errors in
the newly writenupdated documentation.

> It looks like Bio.SeqIO.FASTA.FastaWriter might be the right thing,
> but it doesn't appear to accept Bio.Seq.Seq objects. Do I need to
> create a new type of Seq-like object before I can use
> Bio.SeqIO.FASTA.FastaWriter?

We plan to mark that particular (undocumented) bit of BioPython as
depriated in the next release - so I can't really recommend using it.

Personally, before working on Bio.SeqIO I used to write fasta files
"by hand" using something like this:

handle = open("example.faa", "w")
for identifier, seq in some_list_of_tuples :
    handle.write(">%s\n%s\n" % (identifier, seq.tostring()))
handle.close()

where identifier is a string, and seq is a BioPython Seq object.
There is a lot to be said for doing it "by hand" if you want full
control over the description for example.

Peter


From chris.lasher at gmail.com  Wed Mar 28 17:19:12 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Wed, 28 Mar 2007 13:19:12 -0400
Subject: [BioPython] Biopython to migrate to Subversion
In-Reply-To: <128a885f0703240935p5c139736yfe3142bdbc9d6ac4@mail.gmail.com>
References: <128a885f0703240935p5c139736yfe3142bdbc9d6ac4@mail.gmail.com>
Message-ID: <128a885f0703281019k6c64807dpa4e5d54621c944cc@mail.gmail.com>

I have a correction to make. Again, this only affects those of you
whom obtain your Biopython code via CVS.

On 3/24/07, Chris Lasher <chris.lasher at gmail.com> wrote:
> Subversion operates through HTTP/HTTPS (ports 80 and 443,
> respectively), and specifically uses WebDAV, an extended HTTP
> protocol. Though highly unusual, some organizations networks' may
> block WebDAV traffic. One way to check whether your organization does
> this is to attempt to checkout an existing Subversion repository from,
> for instance, a Google Code Project (all of which use Subversion
> repositories). For example, you can attempt to check out Kongulo
> <http://code.google.com/p/google-kongulo/source>. If you can checkout
> an existing repository, you will be ready to migrate to Biopython's
> Subversion repository once in place.

The Subversion server runs through SSH, *not* through WebDAV, and so
is accessed in the same way that the CVS repository is now. If you can
access the CVS repository now, you can access Subversion repository
once we implement it. Therefore, the only case of need for legacy
support for Subversion is if you cannot get Subversion installed. If
this is the case, please notify me as soon as possible.

Thanks,
Chris