From bugzilla-daemon at portal.open-bio.org Mon Jun 2 04:19:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 04:19:50 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
2.2.18 though works with blastpgp 2.2.15
In-Reply-To:
Message-ID: <200806020819.m528JoXn006809@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2502
------- Comment #19 from ibdeno at gmail.com 2008-06-02 04:19 EST -------
Thank you, Peter.
In principle, I don't use that information. I will try then with the XML
parser.
Cheers,
Miguel
(In reply to comment #18)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 2 04:49:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 04:49:55 -0400
Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp
2.2.18 though works with blastpgp 2.2.15
In-Reply-To:
Message-ID: <200806020849.m528ntdY008609@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2502
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-02 04:49 EST -------
Marking this bug as fixed.
The original report was about parsing the plain text output which is fixed -
see comment 12, and Bio/Blast/NCBIStandalone.py CVS revision 1.72. I have not
added the 2.2.18 plain text file as a unit test since its over 750kb.
For the XML output from 2.2.18, as far as I can tell we are not ignoring any
important information from PSI-BLAST, as it is simply not included. If the
NCBI updates the XML output from blastpgp then we should revisit the XML
parsing.
Thank you Miguel for your report and assistance.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 2 06:37:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 06:37:51 -0400
Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast
output
In-Reply-To:
Message-ID: <200806021037.m52Abpj9019177@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2503
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-02 06:37 EST -------
Dear Prashanth,
Unless you can provide some more information, I'm going to have to close Bug
2503, as you haven't given us enough to go on.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 2 08:57:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 08:57:20 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200806021257.m52CvKt4026676@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-02 08:57 EST -------
I've added simple __str__ and __repr__ methods to the alignment class in
Bio/Align/Generic.py CVS revision 1.8, which give output like this:
str(a):
DNAAlphabet() alignment with 3 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma
repr(a):
<__main__.Alignment instance (3 records of length 14, DNAAlphabet()) at
9e96c2c>
The string output gets truncated to show a maximum of 20 rows and 50 columns,
which allowing for typical identifiers will still display nicely on a default
terminal.
I now intend to update the tutorial, as being able to print an alignment should
make it much easier to explain and get to grips with.
Note that there is still some interesting code in both attachment 732 (the
__getitem__ method) and in attachment 770 (e.g. subclassing list and adding
__len__, __add__, __radd__ etc).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 2 09:26:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 09:26:28 -0400
Subject: [Biopython-dev] [Bug 2507] New: Adding __getitem__ to SeqRecord for
element access and slicing
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
Summary: Adding __getitem__ to SeqRecord for element access and
slicing
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
OtherBugsDependingO 1944
nThis:
With a Seq object, you can access individual letters and create sub-sequences
using slicing. You can even use a stride to reverse the sequence, or select
every third letter.
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPAC.unambiguous_dna)
>>> print my_seq
GATCGATGGGCCTATATAGGATCGAAAATCGC
>>> my_seq
Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPACUnambiguousDNA())
>>> my_seq[5:10]
Seq('ATGGG', IUPACUnambiguousDNA())
>>> my_seq[::-1]
Seq('CGCTAAAAGCTAGGATATATCCGGGTAGCTAG', IUPACUnambiguousDNA())
>>> my_seq[5]
'A'
Currently, these operations cannot be done with a SeqRecord object. This
enhancement bug is to allow element access and splicing (perhaps even with a
stride) on SeqRecord objects, where the annotations are taken into
consideration, and preserved as far as reasonably possible.
Looking at the different SeqRecord properties, this is what I think should
happen for creating a sub-sequence:
.id, .name, .description (three strings) - preserve?
Blindly preserving these may not always be meaningful. For example, if the
description was "Complete plasmid" then it doesn't really apply to a
sub-sequence. Perhaps we should preserve only the id and name, and set the
description to "sub-sequence"?
.annotations (dictionary) - either preserve or lose?
Some annotation entries will still be valid for a sub-sequence (e.g. "source"
or references). Others will not (e.g. anything describing its coordinates
within a larger parent sequence). There is no reliable way to decide on a case
by case basis.
.dbxrefs (list of strings) - preserve?
Any database cross-references would arguably still apply to a sub-sequence or
even a reversed sequence.
.features (list of SeqFeatures) - select only those features still in the new
sub-sequence, and adjust their locations for the new coordinates. Supporting
strides other than +1 would be complicated! For simplicity, I would say any
feature only partially within the sub-sequence should be discarded.
In summary, one clearly defined set of actions on creating a sub-sequence could
be to preserve all the annotation data except the SeqFeatures which would be
handled sensibly.
[If we later support "per-letter-annotation" in either a Seq or SeqRecord
subclass, then this too should be spliced]
Adding a __getitem__ method to the SeqRecord as outlined above should be
compatible with the suggestion that the SeqRecord subclasses the Seq object
(see bug 2351).
A related point, when accessing single letters, e.g. record[0], should a single
letter string be returned (which lacks any annotation) as currently happens
with the Seq object?
P.S. I'm marking this new enhancement bug as blocking bug 1944. Once SeqRecord
objects support splicing, this would make annotation preserving slicing of
alignment objects much more straightforward.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 2 09:26:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 09:26:33 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200806021326.m52DQXk2029561@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |2507
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 2 10:00:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 10:00:15 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806021400.m52E0FJK032027@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-02 10:00 EST -------
Simple implementation with ignores the features (non-trivial) to be added to
the SeqRecord class in Bio/SeqRecord.py
def __getitem__(self, index) :
if isinstance(index, int) :
#TODO - Should single letters be returned as just
#strings? This prevents the inclusion of any annotation.
#Revisit this once the Seq object is a subclass of string.
return self.seq[index]
elif isinstance(index, slice) :
answer = self.__class__(self.seq[index],
id=self.id,
name=self.name,
description=self.description)
#COPY the annotation dict and dbxefs list:
answer.annotations = dict(self.annotations.iteritems())
answer.dbxrefs = self.dbxrefs[:]
#TODO - select relevant features, and add them with
#adjusted coordinates. Take special care with a stride!
return answer
raise ValueError, "Invalid index"
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 2 10:12:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Jun 2008 10:12:29 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806021412.m52ECT86000330@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
------- Comment #2 from jblanca at btc.upv.es 2008-06-02 10:12 EST -------
Does this means that SeqRecord would deprecate the .seq attribute? If the .seq
attribute is not removed slicing could be used in it like: my_seq[1:100] and
my_seq.seq[1:100].
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Mon Jun 2 10:14:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Jun 2008 15:14:40 +0100
Subject: [Biopython-dev] sequence class proposal
In-Reply-To: <1211779470.483a498e18e3e@webmail.upv.es>
References: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com>
<1211779470.483a498e18e3e@webmail.upv.es>
Message-ID: <320fb6e00806020714s2c789f61ke676a448e2ec871a@mail.gmail.com>
In reply to Jose, I (Peter) wrote:
>> One of your points seemed to be that the SeqRecord couldn't have a
>> __getitem__ and methods like reverse, complement, etc. I don't see
>> why it couldn't have these. Perhaps rather than introducing a whole
>> new class, enhancing the SeqRecord would be a better avenue.
I've filed Bug 2507 to try and show what I had in mind for the
__getitem__ method.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
Adding further methods for (reverse) complement etc could be done in
much the same way.
Returning to extending Biopython to support per-letter-annotation, I
can see two options:
Right now, the SeqRecord object HAS a Seq object. If we create a new
RichSeq which subclasses the Seq object to provide
per-letter-annotation, then you could use a SeqRecord where the .seq
property is in fact a RichSeq object. The SeqRecord class doesn't
need to have any changes made for this to work (assuming the RichSeq
provides the same API as the Seq object).
If we make the SeqRecord a subclass of the Seq object, then I would
suggest either RichSeq subclassing SeqRecord subclassing Seq, or
perhaps SeqRecord subclassing RichSeq subclassing Seq. It depends on
if you think the id/name/description/dbxrefs/etc properties would be
useful in common use cases of the RichSeq object.
Its not going to be possible for all three classes to have the same
__init__ parameters without breaking existing scripts (and only
supporting the lowest common denominator).
Peter
From jblanca at btc.upv.es Mon Jun 2 15:11:19 2008
From: jblanca at btc.upv.es (Blanca Postigo Jose Miguel)
Date: Mon, 2 Jun 2008 21:11:19 +0200
Subject: [Biopython-dev] Fwd: Re: sequence class proposal
Message-ID: <1212433879.484445d7a6117@webmail.upv.es>
----- Mensaje reenviado de Blanca Postigo Jose Miguel -----
Fecha: Mon, 2 Jun 2008 21:08:59 +0200
De: Blanca Postigo Jose Miguel
Responder-A: Blanca Postigo Jose Miguel
Asunto: Re: [Biopython-dev] sequence class proposal
Para: Peter
Mensaje citado por Peter :
> In reply to Jose, I (Peter) wrote:
> >> One of your points seemed to be that the SeqRecord couldn't have a
> >> __getitem__ and methods like reverse, complement, etc. I don't see
> >> why it couldn't have these. Perhaps rather than introducing a whole
> >> new class, enhancing the SeqRecord would be a better avenue.
>
> I've filed Bug 2507 to try and show what I had in mind for the
> __getitem__ method.
> http://bugzilla.open-bio.org/show_bug.cgi?id=2507
I think that would be great. I've just added to the bug a question about the
.seq property of SeqRecord.
> Adding further methods for (reverse) complement etc could be done in
> much the same way.
>
> Returning to extending Biopython to support per-letter-annotation, I
> can see two options:
>
> Right now, the SeqRecord object HAS a Seq object. If we create a new
> RichSeq which subclasses the Seq object to provide
> per-letter-annotation, then you could use a SeqRecord where the .seq
> property is in fact a RichSeq object. The SeqRecord class doesn't
> need to have any changes made for this to work (assuming the RichSeq
> provides the same API as the Seq object).
Here I had a slighty different idea, but maybe yours is better. Basically my
RichSeq proposal is just a RichSeq with slicing and without the seq property.
The problem with the approach that you describe is that the RichSeq should have
the per-letter-annotation, so SeqRecord would have a general annotation and
RichSeq (in the .seq) would have other features. I would find that confusing.
>
> If we make the SeqRecord a subclass of the Seq object, then I would
> suggest either RichSeq subclassing SeqRecord subclassing Seq, or
> perhaps SeqRecord subclassing RichSeq subclassing Seq. It depends on
> if you think the id/name/description/dbxrefs/etc properties would be
> useful in common use cases of the RichSeq object.
If SeqRecord is a subclass of Seq RichSeq is not necessary anymore. That's what
I was proposing. The problem is that the current users of SeqRecord would had a
hard time with the new behaviour, because in that case supporting the seq
property would be hard. To avoid that breakage I was proposing to create
RichSeq. RichSeq would be just the SeqRecord that you propose but would allow
the users to migrate to RichSeq without forcing them to change to a new
SeqRecord behaviour.
>
> Its not going to be possible for all three classes to have the same
> __init__ parameters without breaking existing scripts (and only
> supporting the lowest common denominator).
That's another reason to rename your new proposed SeqRecord to RichSeq.
>
> Peter
>
Jose Blanca
--
----- Fin del mensaje reenviado -----
--
From biopython at maubp.freeserve.co.uk Mon Jun 2 15:51:30 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Jun 2008 20:51:30 +0100
Subject: [Biopython-dev] Fwd: Re: sequence class proposal
In-Reply-To: <1212433879.484445d7a6117@webmail.upv.es>
References: <1212433879.484445d7a6117@webmail.upv.es>
Message-ID: <320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>
Jose wrote:
> > I've filed Bug 2507 to try and show what I had in mind for the
> > __getitem__ method.
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2507
>
> I think that would be great.
Good :)
Does anyone else want to comment?
> I've just added to the bug a question about the .seq property of SeqRecord.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507#c2 reads:
> Does this means that SeqRecord would deprecate the .seq attribute?
> If the .seq attribute is not removed slicing could be used in it like:
> my_seq[1:100] and my_seq.seq[1:100].
I was not intending to deprecate the SeqRecord's .seq property at this
time (I think that should happen in preparation for if/when the
SeqRecord becomes a subclass of the Seq object).
With my idea described on bug 2507, given a SeqRecord object my_seq_record:
my_seq_record[1:100] -> another SeqRecord (with annotation)
my_seq_record.seq[1:100] -> just a Seq object (no annotation)
my_seq_record.seq.tostring()[1:100] -> just a string (no annotation or alphabet)
str(my_seq_record.seq)[1:100] -> just a string (no annotation or alphabet)
These trivial examples would all "contain" the same sequence string.
This enhancement could be done right now, and shouldn't impeed any
future per-letter-annotation enhancements.
Perhaps per-letter-annotation enhancements could be added to the
SeqRecord class directly... I need to fully digest the discussion on
the BioSQL list, see:
http://lists.open-bio.org/pipermail/biosql-l/2008-May/thread.html
Peter
From mjldehoon at yahoo.com Mon Jun 2 20:19:59 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 2 Jun 2008 17:19:59 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil
In-Reply-To: <320fb6e00805300717v60f0b153i88b5e9a8aee1744c@mail.gmail.com>
Message-ID: <624249.42121.qm@web62408.mail.re1.yahoo.com>
OK I'll double-check. I may not have noticed some missing DTDs if they were downloaded automatically from the internet. I think Biopython should ship the most common DTDs. At least the ones needed for test_Entrez, which probably covers most of the use cases of Bio.Entrez.
--Michiel.
Peter wrote: On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.
The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)
Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files? I've had trouble with
Biopython trying to fetch missing DTD files from the internet. I
think the problem is the NCBI using relative URLs. The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):
279,280c279,288
< warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
< handle = urllib.urlopen(systemId)
---
> warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
> if "/" in systemId :
> #Assume this is a full path, e.g.
> #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
> handle = urllib.urlopen(systemId)
> else :
> #Its a relative path, and I'm not sure how to best get the base path:
> handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)
(Also note there seem to be some tab/space isssues in this file).
>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:
egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd
Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:
NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd
With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.
Peter
From bugzilla-daemon at portal.open-bio.org Tue Jun 3 00:39:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 00:39:24 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806030439.m534dOYI021682@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2008-06-03 00:39 EST -------
I agree that type checking is a problem.
I am not sure if a specialized function in Bio.File is a good idea. The
question is not if "this object is a file-like object", but "does this object
have the attributes/methods needed". So I would prefer to add checks only for
the required attributes/methods in each of the iterators.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Tue Jun 3 00:33:27 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 2 Jun 2008 21:33:27 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil
In-Reply-To: <624249.42121.qm@web62408.mail.re1.yahoo.com>
Message-ID: <112249.61498.qm@web62410.mail.re1.yahoo.com>
I checked but I did not see any missing DTDs. Most of the DTDs in the list you sent are in Biopython's CVS under Bio/Entrez/DTDs, and are included correctly if I do a fresh checkout from CVS. Maybe could you try with a fresh checkout?
--Michiel.
Michiel de Hoon wrote: OK I'll double-check. I may not have noticed some missing DTDs if they were downloaded automatically from the internet. I think Biopython should ship the most common DTDs. At least the ones needed for test_Entrez, which probably covers most of the use cases of Bio.Entrez.
--Michiel.
Peter wrote: On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.
The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)
Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files? I've had trouble with
Biopython trying to fetch missing DTD files from the internet. I
think the problem is the NCBI using relative URLs. The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):
279,280c279,288
< warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
< handle = urllib.urlopen(systemId)
---
> warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
> if "/" in systemId :
> #Assume this is a full path, e.g.
> #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
> handle = urllib.urlopen(systemId)
> else :
> #Its a relative path, and I'm not sure how to best get the base path:
> handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)
(Also note there seem to be some tab/space isssues in this file).
>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:
egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd
Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:
NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd
With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.
Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From bugzilla-daemon at portal.open-bio.org Tue Jun 3 05:16:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 05:16:48 -0400
Subject: [Biopython-dev] [Bug 2446] Comments in CT tags cause
Bio.Sequencing.Ace.ACEParser to fail.
In-Reply-To:
Message-ID: <200806030916.m539GmwZ001955@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2446
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-03 05:16 EST -------
As pointed out on the mailing list, the test cases attached to this bug have
disappeared (some expiry issue?). In the mean time, we could probably just
edit the sole existing test case in Tests/Ace/contig1.ace to add a comment to
an existing CT tag.
Looking at this file, for example edit:
CT{
Contig1 repeat phrap 52 53 555456:555432
This is the forst line of comment for c1
and this the second for c1
}
to become:
CT{
Contig1 repeat phrap 52 53 555456:555432
COMMENT{
This is the first line of comment for c1
and this the second for c1}
}
In the short term, we could either ignore the COMMENT tags within a CT tag, or
just treat them as plain next. Supporting the nested structure within the
current would require changes to the current Record structure.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jun 3 07:46:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 07:46:58 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806031146.m53BkwAB009224@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #5 from cracka80 at gmail.com 2008-06-03 07:46 EST -------
(In reply to comment #4)
> I agree that type checking is a problem.
> I am not sure if a specialized function in Bio.File is a good idea. The
> question is not if "this object is a file-like object", but "does this object
> have the attributes/methods needed". So I would prefer to add checks only for
> the required attributes/methods in each of the iterators.
>
The function I have written does exactly this - it checks for the necessary
attributes and methods for a given object. The iterators would then only need
to call ``File.is_filelike()`` on each object passed into them, rather than a
type checking procedure. This is in accordance with the design pattern "Program
to an 'interface', not an 'implementation'." (Gang of Four). Would you like me
to provide a diff against the current revision of Biopython, with suggested
changes?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jun 3 11:07:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jun 2008 11:07:35 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806031507.m53F7Zm7019694@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2008-06-03 11:07 EST -------
Two things:
1) Some of the code that does type checking for file-like-ness seems to be
quite old and possibly outdated (e.g. Gobase.Iterator). We should take this
opportunity to go through these modules and check if they are still useful.
2) Many of these modules (especially the ones that use an "Iterator" class)
would be written differently in modern Python (in particular by making use of a
generator function instead of an Iterator class).
So I'd like to suggest the following:
-) For the modules whose usability is dubious in 2008, let's check on the
mailing list if anybody is still using them. If not, we can simply deprecate
them.
-) For the modules that are still useful, use try/except clauses to check for
the necessary attributes. The current function checks for 'read', 'readline',
'readlines', and '__iter__', whereas the parser probably only needs one of
them.
-) If possible, I'd prefer to convert to modern Python as much as possible
(though formally that is not within the scope of this bug report).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 4 15:50:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Jun 2008 15:50:14 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806041950.m54JoEPj029720@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
------- Comment #3 from jblanca at btc.upv.es 2008-06-04 15:50 EST -------
Created an attachment (id=927)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=927&action=view)
RichSeq proposal
I have coded a sequence class that fullfils the requirements that I would like
to see. It's very similar to SeqRecord, but it is not compatible with it. It
has no seq property, although that can be solved. The problem with SeqRecord is
that it is not possible to create a class with an __init__ compatible with Seq
and SeqRecord at the same time.
This proposed class is just a draft, it needs more work but I would like to
receive comments about it.
It inherits from MutableSeq so it should be named MutableRichSeq, but it seems
that I'm too lazy to such a long name, I promise to change the name in a later
version and to create a RichSeq with Seq as parent.
Besides RichSeq there is in the attachment two other classes, RichFeature and
BioRange, but I would comment on that in another post.
I think that it is quite important to convert Seq and MutableSeq to newclasses,
what do you think about that? With the new classes we can use properties.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 4 16:19:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Jun 2008 16:19:41 -0400
Subject: [Biopython-dev] [Bug 2508] New: NCBIStandalone.blastall: provide
support for '-F F' and make it safe
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
Summary: NCBIStandalone.blastall: provide support for '-F F' and
make it safe
Product: Biopython
Version: 1.44
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: mmokrejs at ribosome.natur.cuni.cz
The local NCBI blast by default masks low-complexity region by SEG algorithm.
I do not see a variable to affect this in NCBIStandalone.blastall().
Luckily, NCBIStandalone.blastall() is an unsafe function and does not check
whether I pass multiple arguments in a value expected to be a string or number.
Thus, I can do:
_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix='IDENTITY -F 0')
but imagine I would have done:
_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix='IDENTITY -F 0; rm -rf /etc/passwd')
The function should be protected against such attacks like if it would have
been directly exposed to web users as a CGI script. I propose similar defensive
strategy for all functions calling os.system(), os.exec(), os.popen*(), etc.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 04:52:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 04:52:47 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806050852.m558qlPF031059@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-05 04:52 EST -------
I replied to comment 2 on the mailing list. I had intended this particular
bugzilla entry (bug 2507) to be very narrow in scope - purely a small backwards
compatible change to the current SeqRecord
Some of the questions in comment 3 might have fit better on Bug 2351 although
its getting rather long. Rather than taking this issue further off topic, I'll
reply on the mailing list again.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Thu Jun 5 05:17:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Jun 2008 10:17:00 +0100
Subject: [Biopython-dev] Fwd: Re: sequence class proposal
In-Reply-To: <320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>
References: <1212433879.484445d7a6117@webmail.upv.es>
<320fb6e00806021251q6cc1a7e8p36125c1326ab7a14@mail.gmail.com>
Message-ID: <320fb6e00806050217y1c437b01qa7fd21d75a609e8c@mail.gmail.com>
This is in reply to Jose's comment 3 on bug 2507, which was quite broad.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507#c3
> I have coded a sequence class that fullfils the requirements that I
> would like to see. It's very similar to SeqRecord, but it is not compatible
> with it. It has no seq property, although that can be solved. The problem
> with SeqRecord is that it is not possible to create a class with an __init__
> compatible with Seq and SeqRecord at the same time.
Even if one day the SeqRecord is a subclass of the Seq object, there
is no requirement that it have the same __init__ arguments. In fact,
have to be different because for a SeqRecord you should also supply an
identifier (and potentially a name, description and other annotation).
> This proposed class is just a draft, it needs more work but I would like to
> receive comments about it. It inherits from MutableSeq so it should be
> named MutableRichSeq, but it seems that I'm too lazy to such a long name,
> I promise to change the name in a later version and to create a RichSeq
> with Seq as parent.
I agree with you here that when getting a single letter (amino acid or
nucleotide) from a sequence with per-letter-annotation, e.g.
my_sequence[5], it would be very nice to have the
per-letter-annotation like the quality included. This does mean the
object returned can't just be a single one character string. However,
because the current Seq and MutableSeq classes return a simple string,
unless we return a subclass of a string, this risks breaking other
peoples code. So, I would conclude that Seq needs to subclass a
string BEFORE we start including support for per-letter-annotation.
Ideally we would have alphabet aware versions of all the string
functions before we made this change (see Bug 2351).
> Besides RichSeq there is in the attachment two other classes, RichFeature
> and BioRange, but I would comment on that in another post.
Your BioRange and BioFeature classes seem somewhat similar to the
current SeqFeature class with its locations (and sub features).
> I think that it is quite important to convert Seq and MutableSeq to newclasses,
> what do you think about that? With the new classes we can use properties.
I have been thinking about deprecating the Seq.data property (and also
the MutableSeq). The data string (or array) should really be a
private implementation detail, perhaps Seq._data following the
underscore for private convention. We can then add property methods
to make the Seq.data available (perhaps with a deprecation warning).
Peter
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 05:36:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 05:36:18 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806050936.m559aINS001028@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-05 05:36 EST -------
Created an attachment (id=928)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=928&action=view)
Patch to Bio/SeqRecord.py adding __getitem__ and __len__ and __iter__
Patch based on my comment 1, with addition of __len__ allowing len(my_record)
rather than len(my_record.seq) and an explicit __iter__ method (although this
is not required, it lets us give a doc string).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 06:18:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:18:11 -0400
Subject: [Biopython-dev] [Bug 2509] New: Deprecating the .data property of
the Seq and MutableSeq objects
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2509
Summary: Deprecating the .data property of the Seq and MutableSeq
objects
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
OtherBugsDependingO 2351
nThis:
In anticipation that the Seq and MutableSeq objects will eventually subclass
the python string, their data property is not needed and confusing. The
following patch will replace it with a new-class style property methods and a
docstring declaring it to be deprecated.
In the case of the Seq object, the sequence should be read only but the user
can currently modify the data property in place.
In the case of the MutableSeq, the fact that it is internally an array of
characters should be a private implementation detail.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 06:18:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:18:14 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200806051018.m55AIE7S003198@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |2509
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 06:47:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 06:47:43 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806051047.m55AlhBe004755@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-05 06:47 EST -------
Note that adding __len__ has a knock on effect when dealing with SeqRecord
objects with a zero length sequence - they now evaluate to False rather than
True.
This was an issue for some of the unit tests where "if record" was used rather
than the more explicit "if record is not None".
This change could therefore have unexpected side effects in existing scripts,
however adding __len__ is required if we intend to make the SeqRecord act more
like the Seq object.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 07:03:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 07:03:27 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200806051103.m55B3RUU005472@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-05 07:03 EST -------
You seem to have identified two issues. Adding support for -F should be fairly
easy.
For the security issue, the caller should be validating their input. Also if
running from a web-server, the permissions should also be restricted - failing
to do this is asking for trouble.
However, defence in layers would be good. Would you suggest a simple check for
the ";" character? What about escaped semi-colons? Also this a platform
dependant issue. The ";" character is Unix only. At the Windows command line
you have to use an &&.
Do you have a patch in mind?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 08:56:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 08:56:21 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200806051256.m55CuLfC010670@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #2 from mmokrejs at ribosome.natur.cuni.cz 2008-06-05 08:56 EST -------
For the latter issue, I would go and use some python library to escape shell
metacharacters. cgi.escape() doesn't do what I would like to. Or cgi.wrap()?
Google search returned some hints:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498202
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66012
http://e-articles.info/e/a/title/Command-Injection/
https://bugs.gentoo.org/show_bug.cgi?id=187971#c5
https://bugs.gentoo.org/show_bug.cgi?id=187971#c23
http://mail.python.org/pipermail/python-3000/2007-May/007192.html
http://www.owasp.org/index.php/Interpreter_Injection
http://www.velocityreviews.com/forums/t352309-sql-escaping-module.html
One could learn or even use escaping functions from e.g. MySQLdb.escape()
of MySQLdb.connection.escape_string() but I don't think it is a complete
solution. I will try to think of it more later.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 09:25:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 09:25:43 -0400
Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs
urgent optimization
In-Reply-To:
Message-ID: <200806051325.m55DPhrQ012033@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2494
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-05 09:25 EST -------
I've commited this patch to CVS as part of BioSQL/BioSeq.py revision 1.24
If you could update you installation of Biopython to CVS and test this please
Eric, then I think we can mark this bug as fixed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 09:29:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 09:29:25 -0400
Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the
Seq and MutableSeq objects
In-Reply-To:
Message-ID: <200806051329.m55DTP30012244@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2509
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-05 09:29 EST -------
Created an attachment (id=929)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=929&action=view)
Patch to Bio/Seq.py
This turns out to be quite a big change, and while the unit tests still pass
more extensive testing would be a good idea.
Alternatively, we could just leave expose .data as a read only property, and
switch to ._data (or a string subclass) instead.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 5 13:55:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jun 2008 13:55:02 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806051755.m55Ht2TS024644@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #7 from cracka80 at gmail.com 2008-06-05 13:55 EST -------
I understand your approach that these functions should be converted to modern
Python, but it must also be remembered that Biopython as a whole is Python
2.3-compatible, so care must be taken not to modernise code too much. I can't
remember when iterators were phased in, but it should be possible, I think it
was around 2.2 anyway.
(In reply to comment #6)
> Two things:
> 1) Some of the code that does type checking for file-like-ness seems to be
> quite old and possibly outdated (e.g. Gobase.Iterator). We should take this
> opportunity to go through these modules and check if they are still useful.
> 2) Many of these modules (especially the ones that use an "Iterator" class)
> would be written differently in modern Python (in particular by making use of a
> generator function instead of an Iterator class).
>
> So I'd like to suggest the following:
> -) For the modules whose usability is dubious in 2008, let's check on the
> mailing list if anybody is still using them. If not, we can simply deprecate
> them.
> -) For the modules that are still useful, use try/except clauses to check for
> the necessary attributes. The current function checks for 'read', 'readline',
> 'readlines', and '__iter__', whereas the parser probably only needs one of
> them.
> -) If possible, I'd prefer to convert to modern Python as much as possible
> (though formally that is not within the scope of this bug report).
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jun 7 04:26:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 7 Jun 2008 04:26:54 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806070826.m578Qsj4019312@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2008-06-07 04:26 EST -------
(In reply to comment #7)
> I understand your approach that these functions should be converted to modern
> Python, but it must also be remembered that Biopython as a whole is Python
> 2.3-compatible, so care must be taken not to modernise code too much. I can't
> remember when iterators were phased in, but it should be possible, I think it
> was around 2.2 anyway.
>
Bio.Blast.NCBIXML already uses generator functions to return iterators, so I
think we are fine as far as compatibility with Python 2.3 and later is
concerned.
I'll ask on the mailing list if Bio.Gobase has any users, to get started.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Sat Jun 7 04:35:05 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 7 Jun 2008 01:35:05 -0700 (PDT)
Subject: [Biopython-dev] Bio.Gobase, anybody?
Message-ID: <844450.31822.qm@web62415.mail.re1.yahoo.com>
Hi everbody,
As part of bug report 2454:
http://bugzilla.open-bio.org/show_bug.cgi?id=2454,
I started looking at the Bio.Gobase module.
This module provides access to the gobase database:
http://megasun.bch.umontreal.ca/gobase/
This module is about seven years old and (AFAICT)
is not actively maintained. We don't have documentation
for this module, but the unit tests suggests that it
parses HTML files from gobase. I am not sure exactly
where the HTML files came from, but I doubt that
after seven years this still works.
So I was wondering:
Does anybody use Bio.Gobase?
If not, I suggest we deprecate it for the next release,
and remove it in some future release.
If there are users, we need to make some (small) changes
to this module (that is what the original bug report
was about).
--Michiel.
From bugzilla-daemon at portal.open-bio.org Mon Jun 9 08:45:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 9 Jun 2008 08:45:24 -0400
Subject: [Biopython-dev] [Bug 2511] New: setup.py problem with del
sys.modules["Martel"]
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2511
Summary: setup.py problem with del sys.modules["Martel"]
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: Mac OS
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
I'm currently trying to install Biopython from source (CVS) on a clean Mac OS X
machine, without reportlab, Numeric or mxTextTools. I've run into a small
issue with "python setup.py build" related to the testing for an existing
Martel distribution (since Martel has been distributed separately from
Biopython before) due to the lack of mxTextTools.
Traceback (most recent call last):
File "setup.py", line 508, in
'Bio.PopGen': ['SimCoal/data/*.par'],
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/core.py",
line 151, in setup
dist.run_commands()
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 974, in run_commands
self.run_command(cmd)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 994, in run_command
cmd_obj.run()
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/command/build.py",
line 112, in run
self.run_command(cmd_name)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/cmd.py",
line 333, in run_command
self.distribution.run_command(command)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/distutils/dist.py",
line 994, in run_command
cmd_obj.run()
File "setup.py", line 157, in run
if not is_Martel_installed():
File "setup.py", line 292, in is_Martel_installed
del sys.modules["Martel"] # Delete the old version of Martel.
The function is_Martel_installed() starts by trying to load the bundled
Martel, by calling can_import("Martel"). This is failing with an ImportError
from mxTextTools - and hence the Martel version of the bundled copy cannot be
determined. The next line of is_Martel_installed() causes the problem:
del sys.modules["Martel"]
I think this only makes sense if the module could be imported, patch to follow.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 9 08:46:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 9 Jun 2008 08:46:51 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
sys.modules["Martel"]
In-Reply-To:
Message-ID: <200806091246.m59Ckpts011798@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2511
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-09 08:46 EST -------
Created an attachment (id=930)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=930&action=view)
Patch to setup.py
How does this look Michiel?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jun 10 07:37:42 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jun 2008 12:37:42 +0100
Subject: [Biopython-dev] Giving the SeqRecord a length? Evaluating it as a
boolean
Message-ID: <320fb6e00806100437n21e53369p36c85a810007ca19@mail.gmail.com>
Something we've discussed before is making the SeqRecord more like a
Seq object, perhaps even subclassing it. I've got a patch on Bug 2507
to make some small steps in this direction - accessing elements of the
sequence by indexing the SeqRecord, i.e. letter = my_seq_record[5], or
iterating over the letters in a SeqRecord's sequence.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
In addition, I would like to give the SeqRecord a length, allowing
len(my_seq_record) rather than len(my_seq_record.seq). However, this
has a side effect on the evaluation of a SeqRecord as a boolean.
Before, any sequence was True, but if we add the __len__ method then
any SeqRecord with a zero length sequence will evaluate as False.
This is a real issue, for example you can have GenBank files without a
sequence (see our unit test cases). One example where this is
important is if you are using an iterator via the .next() method and
had been checking for a returned None by using "if record:" (something
some of the older unit tests were doing) you would have to start using
"if record is not None:" instead.
If the old behaviour is desirable (evaluating a SeqRecord as a boolean
is alway True), we could implement a __nonzero__ method to preserve
it, see: http://docs.python.org/ref/customization.html
What do people think? Would adding a __len__ method to the SeqRecord
cause trouble?
Peter
From mjldehoon at yahoo.com Tue Jun 10 19:17:56 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 10 Jun 2008 16:17:56 -0700 (PDT)
Subject: [Biopython-dev] Giving the SeqRecord a length? Evaluating it as
a boolean
In-Reply-To: <320fb6e00806100437n21e53369p36c85a810007ca19@mail.gmail.com>
Message-ID: <797428.30617.qm@web62402.mail.re1.yahoo.com>
+1 for adding a __len__ method, with a __nonzero__ method to let all SeqRecord objects evaluate as true.
--Michiel.
Peter wrote: Something we've discussed before is making the SeqRecord more like a
Seq object, perhaps even subclassing it. I've got a patch on Bug 2507
to make some small steps in this direction - accessing elements of the
sequence by indexing the SeqRecord, i.e. letter = my_seq_record[5], or
iterating over the letters in a SeqRecord's sequence.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
In addition, I would like to give the SeqRecord a length, allowing
len(my_seq_record) rather than len(my_seq_record.seq). However, this
has a side effect on the evaluation of a SeqRecord as a boolean.
Before, any sequence was True, but if we add the __len__ method then
any SeqRecord with a zero length sequence will evaluate as False.
This is a real issue, for example you can have GenBank files without a
sequence (see our unit test cases). One example where this is
important is if you are using an iterator via the .next() method and
had been checking for a returned None by using "if record:" (something
some of the older unit tests were doing) you would have to start using
"if record is not None:" instead.
If the old behaviour is desirable (evaluating a SeqRecord as a boolean
is alway True), we could implement a __nonzero__ method to preserve
it, see: http://docs.python.org/ref/customization.html
What do people think? Would adding a __len__ method to the SeqRecord
cause trouble?
Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From bugzilla-daemon at portal.open-bio.org Tue Jun 10 19:30:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Jun 2008 19:30:20 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
sys.modules["Martel"]
In-Reply-To:
Message-ID: <200806102330.m5ANUKfo019481@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2511
------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-06-10 19:30 EST -------
(In reply to comment #1)
> Created an attachment (id=930)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=930&action=view) [details]
> Patch to setup.py
>
> How does this look Michiel?
>
That looks find to me, though eventually I would prefer to get rid of the
dependence on Martel/mxTextTools altogether.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jun 10 19:42:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Jun 2008 19:42:52 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
sys.modules["Martel"]
In-Reply-To:
Message-ID: <200806102342.m5ANgqct019925@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2511
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-10 19:42 EST -------
In reply to comment 2, would it make sense for the unit test framework to treat
the mxTextTools (or reportlab, or Numeric) import errors as a missing external
dependency?
In the unit tests we used to "ignore" any tests which failed with an
ImportError, but have now switched to our own MissingExternalDependencyError
exception.
We want to distinguish ImportErrors which are external to Biopython (and
therefore can be considered as missing dependencies) from those internal to
Biopython (perhaps due to refactoring or removal of code - a real unit test
failure). One way to do this would be in the bits of Biopython that try to
import mxTextTools (or any other module) to raise
MissingExternalDependencyError (or something that is a subclass of both
MissingExternalDependencyError and the built in ImportError).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 11 02:54:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 02:54:32 -0400
Subject: [Biopython-dev] [Bug 2516] New: Make it clear what is numeric and
what is numpy
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2516
Summary: Make it clear what is numeric and what is numpy
Product: Biopython
Version: 1.45
Platform: PC
URL: http://www.biopython.org/DIST/docs/install/Installation.
html
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Documentation
AssignedTo: biopython-dev at biopython.org
ReportedBy: mmokrejs at ribosome.natur.cuni.cz
Hi,
although both packages are from the same source site, numpy is the newer
implementation whereas numeric is the old, deprecated implementation, right?
Why do you say in the installation docs the following?
"The Numerical Python distribution (also known an Numeric or Numpy) is a fast
implementation of arrays and associated array functionality. This is important
for a number of Biopython modules that deal with number processing. The main
web site for Numeric is: http://sourceforge.net/projects/numpy and downloads
are available from:..."
I think it is fooling.
BTW, is numpy-1.1.0 supported?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 11 04:47:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 04:47:32 -0400
Subject: [Biopython-dev] [Bug 2511] setup.py problem with del
sys.modules["Martel"]
In-Reply-To:
Message-ID: <200806110847.m5B8lWxd010254@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2511
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-11 04:47 EST -------
Patch checked into CVS as Biopython/setup.py revision 1.133, marking this bug
as fixed.
The issue I raised in comment 3 is still outstanding (external ImportErrors and
the unit tests). We may want to file a separate bug, or discuss this on the
dev mailing list.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 11 04:53:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 04:53:30 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
is numpy
In-Reply-To:
Message-ID: <200806110853.m5B8rU2t010552@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2516
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-11 04:53 EST -------
That text is rather out of date - if you are familiar with the history of
Numeric, numarray and numpy you'll know that the old module used with "import
Numeric" was called Numerical Python or NumPy for short. This shorthand was
used in lots of documentation (not just in Biopython). I think the choice to
call the third generation of the array packages numpy has caused a lot of
confusion.
See http://numpy.scipy.org/#older_array
We had updated the Biopython website and other bits of documentation, but had
missed this one. Thank you for point this out.
P.S. Supporting numpy instead of Numeric is Biopython Bug 2251.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 11 05:04:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 05:04:47 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806110904.m5B94li8011303@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-11 05:04 EST -------
I raised the issue of evaluating a SeqRecord as a boolean with a proposal that
would could add __len__ but also add __nonzero__ to ensure that any SeqRecord
evaluates as True (even if the sequence is of length zero):
http://lists.open-bio.org/pipermail/biopython-dev/2008-June/003756.html
Michiel was in favour of this:
> +1 for adding a __len__ method, with a __nonzero__ method to let all SeqRecord
> objects evaluate as true.
The patch isn't ready yet because in addition it doesn't get deal with the
SeqFeature objects. I think the SeqFeature class needs a _shift(offset) method
to return a copy of itself with its location (and the locations of any
sub-features) adjusted.
I'm still not sure about handling strides, and I am tempted to rule that if a
stride other than one is used then the features of the SeqRecord are lost.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 11 09:57:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 09:57:56 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806111357.m5BDvu1I024400@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #928 is|0 |1
obsolete| |
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-11 09:57 EST -------
Created an attachment (id=937)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=937&action=view)
Patch to Bio/SeqRecord.py and Bio/SeqFeature.py
This modifies the SeqRecord to give it __getitem__ (supporting sliced
annotations including features), __len__ (to return the length of the
sequence). __nonzero__ (to ensure any SeqRecord evaluates as True regardless of
the length of its sequence) and __iter__ (to explicitly support iteration over
the sequence with a docstring). As part of this, assorted objects in
SeqFeature.py get a private _shift() method taking an integer offset to return
a self copy with an adjusted location.
Note that slices with a stride (other than one) will result in the features
being lost. Handling (positive) strides would require complicated
consideration about if an exact location is still present, and if not replacing
it with either a fuzzy position or a range. Negative strides are worse!
The current set of unit tests seem fine, but addition checks would need to be
added to validate this new behaviour.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 11 11:26:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Jun 2008 11:26:59 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806111526.m5BFQxMw029057@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp 2008-06-11 11:26 EST -------
I "fixed" SwissProt.SProt.Iterator by deprecating it. Instead of
SwissProt.SProt.Iterator, we recommend using Bio.SwissProt.parse and
Bio.SeqIO.parse.
Next on the to-do list is SwissProt.KeyWList.extract_keywords.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 12 10:23:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Jun 2008 10:23:16 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806121423.m5CENG95026678@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2008-06-12 10:23 EST -------
SwissProt.KeyWList.extract_keywords could only parse very old SwissProt files.
I deprecated it and wrote a new function "parse" that parses current SwissProt
files. This function does not do the file-like check.
Prosite.Iterator and Prosite.Prodoc.Iterator are next.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From fkauff at biologie.uni-kl.de Thu Jun 12 10:33:56 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Thu, 12 Jun 2008 16:33:56 +0200
Subject: [Biopython-dev] CVS access and developers web site
In-Reply-To: <320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
References: <483E7578.50402@biologie.uni-kl.de>
<320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
Message-ID: <485133D4.2060405@biologie.uni-kl.de>
Peter Cock wrote:
> Hi Frank,
>
> I would try emailing support at helpdesk.open-bio.org using the email
> address associated with your CVS username. If you've changed email
> address, and you run into problems, I expect Michiel or I could vouch
> for you.
>
Is somebody monitoring that email address? I got an automated response
about two weeks ago, and then nothing happened.
> For the website, the wiki usernames are entirely separate and you
> should be able to create a new account if you don't have one already.
> If you want to update the tutorial new HTML and PDF files are loaded
> with each release from the version in CVS.
>
Thanks Peter, got access to the wiki and updated personal data.
Frank
> Peter
>
> On Thu, May 29, 2008 at 10:20 AM, Frank Kauff wrote:
>
>> Hi folks,
>>
>> although I've been quiet for a while, I'm still doing some changes to the
>> Nexus parser of biopython from time to time.... I totally lost my passwords
>> to access the repository. Could someone please send me a new password to get
>> write access to cvs? And I would also like to change the information on the
>> biopython developers web site, as they are somewhat outdated.
>> And is this the right place to ask for such things?
>>
>> Thanks!
>>
>> Frank
>>
>
>
From bugzilla-daemon at portal.open-bio.org Thu Jun 12 11:42:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Jun 2008 11:42:58 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806121542.m5CFgw9t029594@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #11 from cracka80 at gmail.com 2008-06-12 11:42 EST -------
Maybe it's a good idea for any parsers/iterators to just use the iterator-like
ability of file handles? Writers would have to function slightly differently,
but since file objects, StringIOs and any other file-like objects must provide
an __iter__ method, it's probably a good idea to take that into consideration
when developing a common interface. In addition, writers could output iterators
or generators, so that they can be chained together to operate on files.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jun 13 12:24:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 12:24:29 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806131624.m5DGOTKw025954@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2008-06-13 12:24 EST -------
(In reply to comment #11)
> Maybe it's a good idea for any parsers/iterators to just use the iterator-like
> ability of file handles?
In principle, yes. In practice, it's not so easy because many parsers in
Biopython follow the framework in Bio.ParserSupport. These parsers are not
really written to deal with lines pulled one-by-one from a file handle. To
reconcile these two, I pull out data line-by-line from the file handle, store
it in a string, and then call the parser to parse it. This is not ideal, and it
may be a good idea for Biopython at some point to change its parser strategy.
> Writers would have to function slightly differently,
> but since file objects, StringIOs and any other file-like objects must provide
> an __iter__ method, it's probably a good idea to take that into consideration
> when developing a common interface. In addition, writers could output
> iterators or generators, so that they can be chained together to operate
> on files.
>
Writers should also be able to just print the record to the screeen. I don't
see how that is easily achievable with generators.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jun 13 12:27:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 12:27:47 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806131627.m5DGRlTE026072@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2008-06-13 12:27 EST -------
Medline.Iterator, Prosite.Iterator, and Prosite.Prodoc.Iterator are now fixed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jun 13 22:29:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 22:29:13 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806140229.m5E2TDdD014417@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp 2008-06-13 22:29 EST -------
I deprecated Bio.Gobase, since no users came forward on the mailing list.
Bio.Rebase is also problematic. It parses HTML from the Rebase database, but it
was written in 2000 and cannot parse current HTML from Rebase (which looks
completely different from the HTML used in 2000).
I'll ask on the mailing list if anybody is willing to update Bio.Rebase.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Fri Jun 13 22:34:05 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 13 Jun 2008 19:34:05 -0700 (PDT)
Subject: [Biopython-dev] Bio.Rebase
Message-ID: <237761.5963.qm@web62409.mail.re1.yahoo.com>
Hi everybody,
As part of bug #2454 on Bugzilla, I am looking at the Bio.Rebase module.
This module parses files (in HTML format) from the Rebase database:
http://rebase.neb.com/rebase/rebase.html
Unfortunately, since this module was written (in 2000) the HTML format used by the Rebase database has changed completely. This module is therefore not able to parse current Rebase HTML files.
Is anybody willing to update Bio.Rebase (either by updating the HTML parser, or preferably by writing a parser for plain-text output from Bio.Rebase)? If not, I think this module should be deprecated.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Fri Jun 13 22:50:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jun 2008 22:50:42 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
is numpy
In-Reply-To:
Message-ID: <200806140250.m5E2ogvf014920@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2516
------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-06-13 22:50 EST -------
According to the Numerical Python website, the NumPy documentation will become
freely available on September 1, 2008. That would be a good time to start
thinking seriously about converting from the "old" Numerical Python to the
"new" NumPy 1.1.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Fri Jun 13 22:46:37 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 13 Jun 2008 19:46:37 -0700 (PDT)
Subject: [Biopython-dev] Bio.SCOP maintainer?
Message-ID: <523172.98428.qm@web62402.mail.re1.yahoo.com>
Still looking at Bug 2454
(http://bugzilla.open-bio.org/show_bug.cgi?id=2454).
To fix this bug, I'd like to make some changes to Bio.SCOP.
Is anybody currently maintaining Bio.SCOP? The changes I'd like to make are small, but it would be better to discuss with the Bio.SCOP maintainer (if there is one) so I won't get in their way.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Sat Jun 14 05:52:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 14 Jun 2008 05:52:09 -0400
Subject: [Biopython-dev] [Bug 2488] Adding XML parsers to Bio.Entrez
In-Reply-To:
Message-ID: <200806140952.m5E9q9X9032018@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2488
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2008-06-14 05:52 EST -------
We now have parsers for XML returned by Entrez, provided the corresponding DTDs
are available. Bio/Entrez/DTDs contains most (all?) DTDs currently used by
Entrez. If later some DTDs appear to be missing, we can simply add them to
Bio/Entrez/DTDs.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jun 14 06:29:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 14 Jun 2008 06:29:12 -0400
Subject: [Biopython-dev] [Bug 2516] Make it clear what is numeric and what
is numpy
In-Reply-To:
Message-ID: <200806141029.m5EATC64001227@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2516
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-06-14 06:29 EST -------
Updated the installation instructions (in CVS, at least).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From p.j.a.cock at googlemail.com Sat Jun 14 18:51:26 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 14 Jun 2008 23:51:26 +0100
Subject: [Biopython-dev] CVS access and developers web site
In-Reply-To: <485133D4.2060405@biologie.uni-kl.de>
References: <483E7578.50402@biologie.uni-kl.de>
<320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com>
<485133D4.2060405@biologie.uni-kl.de>
Message-ID: <320fb6e00806141551t56422a98v752e34bbbb38d0aa@mail.gmail.com>
>> Hi Frank,
>>
>> I would try emailing support at helpdesk.open-bio.org using the email
>> address associated with your CVS username. If you've changed email
>> address, and you run into problems, I expect Michiel or I could vouch
>> for you.
>>
>
> Is somebody monitoring that email address? I got an automated response about
> two weeks ago, and then nothing happened.
>
Maybe someone is on holiday - or they are caught up with BOSC 2008
work? I can suggest a few specific people at OBF to try and contact
directly if you are still stuck.
In the short term, if there are any urgent fixes you think need to be
checked in, stick them on Bugzilla and I'm sure one of us will be able
to commit them on your behalf.
Peter
From bugzilla-daemon at portal.open-bio.org Sun Jun 15 03:03:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 15 Jun 2008 03:03:18 -0400
Subject: [Biopython-dev] [Bug 2468] Tutorial needs a fix: Bio.WWW.NCBI
In-Reply-To:
Message-ID: <200806150703.m5F73IF2007099@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2468
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2008-06-15 03:03 EST -------
I created a subsection Examples to the tutorial chapter on Bio.Entrez, and
added
the example from section 2.5 and Martin's taxonomy example to it. With the
Bio.Entrez currently in CVS, finding the lineage works as follows:
>>> handle = Entrez.esearch(db="Taxonomy", term="Cypripedioideae")
>>> record = Entrez.read(handle)
>>> record["IdList"]
['158330']
>>> handle = Entrez.efetch(db="Taxonomy", id="158330", retmode='xml')
>>> records = Entrez.read(handle)
>>> records[0]['Lineage']
'cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina;
Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta;
Liliopsida; Asparagales; Orchidaceae'
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 16 15:23:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 16 Jun 2008 15:23:43 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200806161923.m5GJNhZw012022@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #937 is|0 |1
obsolete| |
------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-16 15:23 EST -------
Created an attachment (id=942)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=942&action=view)
Patch to Bio/SeqRecord.py and Bio/SeqFeature.py
I've checked in the SeqRecord __len__ and __nonzero__ methods with CVS
Bio/SeqRecord.py revision 1.17
The earlier __getitem__ and __iter__ patch has been updated accordingly.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jun 16 16:08:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 16 Jun 2008 16:08:00 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200806162008.m5GK80bv014002@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-16 16:07 EST -------
Created an attachment (id=943)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=943&action=view)
Minimal __getitem__ method for generic alignment
This patch just adds a __getitem__ to the alignment which ONLY accepts a single
integer index and returns the corresponding SeqRecord object. I propose to add
this NOW, as I think even just this is a worthwhile improvement.
This is a natural expectation given the current __iter__ behaviour and the
model of the alignment as a list of SeqRecord objects. Its also part of the
more rich behaviour discussed above, which we can add more easily if/when the
SeqRecord gets a __getitem__ method (bug 2507).
Comments on this particular patch? Should we add __len__ at the same time
giving the number of rows in the alignments?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jblanca at btc.upv.es Tue Jun 17 03:35:38 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Tue, 17 Jun 2008 09:35:38 +0200
Subject: [Biopython-dev] [BioPython] Ace contig files in Bio.SeqIO or
Bio.AlignIO
In-Reply-To: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
References: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
Message-ID: <200806170935.38904.jblanca@btc.upv.es>
Hi:
My main use of the Alignment class is to parse Ace files. I've been thinking
about that problem recently. My proposal to modify SeqRecord was due to this
problem. I think that the best solution would be to treat the Alignment as a
sequence. The consensus would be the actual sequences and the aligned read
would be features with per-base-annotations. I've implemented such a class
and it works fine for me. In fact the Alignment class is just a wrapper
around a standard SeqRecord (I name it RichSeq in my implementation).
To do that you just need a SeqRecord with a __getitem__ method. You have
already proposing that so that's not a problem.
Padding with spaces is not an option when you're dealing with genomic wide
alignments, that's one of the problems of the actual Alignment class.
If you want I can send my implementation to the list, although it could take a
while because I've got my home computer dead.
Best regards,
Jose Blanca
On Monday 16 June 2008 16:01:31 Peter wrote:
> I've recently had to deal with some contig files in the Ace format
> (output by CAP3, but many assembly files will produce this output).
>
> We have a module for parsing Ace files in Biopython,
> Bio.Sequencing.Ace but I was wondering about integrating this into the
> Bio.SeqIO or Bio.AlignIO framework.
> http://www.biopython.org/wiki/SeqIO
> http://www.biopython.org/wiki/AlignIO
>
> I'd like to hear from anyone currently using Ace files, on how they
> tend to treat the data - and if they think a SeqRecord or Alignment
> based representation would be useful.
>
> Each contig in an Ace file could be treated as a SeqRecord using the
> consensus sequence. The identifiers of each sub-sequence used to
> build the consensus could be stored as database cross-references, or
> perhaps we could store these as SeqFeatures describing which part of
> the consensus they support. This would then fit into Bio.SeqIO quite
> well.
>
> Alternatively, each contig could be treated as an alignment (with a
> consensus) and integrated into Bio.AlignIO. One drawback for this is
> doing this with the current generic alignment class would require
> padding the start and/or end of each sequence with gaps in order to
> make every sequence the same length. However, if we did this (or
> created a more specialised alignment class), the Ace file format would
> then fit into Bio.AlignIO too.
>
> So, Ace users - would either (or both) of the above approaches make
> sense for how you use the Ace contig files?
>
> Thanks
>
> Peter
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
From biopython at maubp.freeserve.co.uk Tue Jun 17 04:46:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 09:46:22 +0100
Subject: [Biopython-dev] [BioPython] Ace contig files in Bio.SeqIO or
Bio.AlignIO
In-Reply-To: <200806170935.38904.jblanca@btc.upv.es>
References: <320fb6e00806160701l428584c0i30acac57338b9357@mail.gmail.com>
<200806170935.38904.jblanca@btc.upv.es>
Message-ID: <320fb6e00806170146j6f1843e6hed4166ad62c84423@mail.gmail.com>
On Tue, Jun 17, 2008 at 8:35 AM, Jose Blanca wrote:
> Hi:
> My main use of the Alignment class is to parse Ace files. I've been thinking
> about that problem recently. My proposal to modify SeqRecord was due to this
> problem. I think that the best solution would be to treat the Alignment as a
> sequence. The consensus would be the actual sequences and the aligned read
> would be features with per-base-annotations.
So integrating the "ace" format into Bio.SeqIO representing the
consensus sequence of each contig as a SeqRecord would be useful.
Initially I would try and represent the aligned reads as SeqFeature
objects (much like when reading a genome from a GenBank file you get
CDS features with their amino acid translation).
Note that for memory reasons, I would be inclined to scan over the Ace
file in one pass (using the existing Iterator in the
Bio.Sequencing.Ace parser) returning SeqRecords as we go. As Frank
points out in the code comments, this means we can't easily include
the WA, CT, RT and WR tags found in the Ace file footer. Do you use
this information Jose?
> I've implemented such a class
> and it works fine for me. In fact the Alignment class is just a wrapper
> around a standard SeqRecord (I name it RichSeq in my implementation).
> To do that you just need a SeqRecord with a __getitem__ method. You have
> already proposing that so that's not a problem.
Your enthusiasm Jose is one of the things motivating me to try and do
more with the Seq and SeqRecord. Without a third party to offer
feedback, making big changes is risky.
> Padding with spaces is not an option when you're dealing with genomic wide
> alignments, that's one of the problems of the actual Alignment class.
It might make sense to talk about a "Contig Alignment" object/class,
compared to the existing "multiple sequence alignment" object/class
where all the sequences are the same length. Ideally these should
provide as similar an API as possible - even if the internals are
different. One idea is a sub-class of the current alignment class
which stores an offset (>=0) for each supporting read, used when
accessing columns. Maybe we should check out BioPerl etc for
inspiration?
> If you want I can send my implementation to the list, although it could take a
> while because I've got my home computer dead.
Good luck with the broken computer - I hope you have an easier time
fixing it / rebuilding it than I did last time this hapended to me.
Regards,
Peter
From biopython at maubp.freeserve.co.uk Tue Jun 17 05:16:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 10:16:29 +0100
Subject: [Biopython-dev] Iterating over Ace contig files
Message-ID: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
Hello Frank,
I wanted to get your opinion on iterating over the Ace file contig by
contig, and what is lost in the WA, CT, RT and WR tags at the end of
the file by doing this. As large sequencing runs become more common,
iterating over the file in a single pass WITHOUT keeping everything in
memory does seem to be desirable.
Similar past discussions:
http://portal.open-bio.org/pipermail/biopython/2004-February/001825.html
http://portal.open-bio.org/pipermail/biopython/2005-May/002661.html
Would you object to me rewording your module's header-comment not to
say that the Ace Iterator is NOT deprecated, but rather that it has
certain drawbacks.
[The context for this is my recent thread on the Biopython dev mailing
list about integrating your Bio.Sequencing.Ace parser into Bio.SeqIO
and/or Bio.AlignIO - I've included a little context below.]
Thanks,
Peter
--
Peter wrote:
>> So integrating the "ace" format into Bio.SeqIO representing the
>> consensus sequence of each contig as a SeqRecord would be useful.
>> Initially I would try and represent the aligned reads as SeqFeature
>> objects (much like when reading a genome from a GenBank file you get
>> CDS features with their amino acid translation).
>>
>> Note that for memory reasons, I would be inclined to scan over the Ace
>> file in one pass (using the existing Iterator in the
>> Bio.Sequencing.Ace parser) returning SeqRecords as we go. As Frank
>> points out in the code comments, this means we can't easily include
>> the WA, CT, RT and WR tags found in the Ace file footer. Do you use
>> this information Jose?
Jose replied,
> I haven't used the iterator because of the deprecation warning of the code. I
> tried with about 40000 alignments and it worked in a computer with 8 GB of ram.
> I there are more sequences, and there will be with the 454 sequencer, we will
> have trouble reading all at once. I vote for the iterator approach. I have not
> used the information of this tag, but I don't know also what they mean. I've
> been looking for documentation about this format, but I've found none, do you
> have any good ace documentation?
From bugzilla-daemon at portal.open-bio.org Tue Jun 17 07:23:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 07:23:59 -0400
Subject: [Biopython-dev] [Bug 2520] New: Reading ACE assembly contig files
in Bio.SeqIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2520
Summary: Reading ACE assembly contig files in Bio.SeqIO
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
As I suggested on the mailing list, we could use Bio.Sequencing.Ace to parse
ACE assembly files, and then turn each contig into a SeqRecord using the
consensus sequence.
I will attach a basic implementation which only uses the consensus sequence and
its name. For now this ignores all the meta data and in particular the read
information.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jun 17 07:29:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 07:29:15 -0400
Subject: [Biopython-dev] [Bug 2520] Reading ACE assembly contig files in
Bio.SeqIO
In-Reply-To:
Message-ID: <200806171129.m5HBTFVG026790@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2520
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-17 07:29 EST -------
Created an attachment (id=944)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=944&action=view)
New file Bio/SeqIO/AceIO.py
This new file would be added to Bio.SeqIO in the usual way (updating
Bio/SeqIO/__init__.py to import this module and map the format "ace" to the new
iterator).
Handling different gap characters in Bio.SeqIO (and translating them when
reading and writing files) has not been formalised. Where possible, converting
them into dashes on loading seems to be a sensisble route to take.
Therefore I deliberately map any "*" gap characters in the consensus sequence
into "-" characters, which are used by default in the alphabet class and are
far more commonly used. The "*" character is typically associated with a stop
codon in protein sequences, which is another reason to avoid using it here.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From fkauff at biologie.uni-kl.de Tue Jun 17 09:06:34 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Tue, 17 Jun 2008 15:06:34 +0200
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
References: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
Message-ID: <4857B6DA.9040309@biologie.uni-kl.de>
Hi Peter,
makes totally sense to me. Feel free to do the changes as you see it fit
Frank
Peter wrote:
> Hello Frank,
>
> I wanted to get your opinion on iterating over the Ace file contig by
> contig, and what is lost in the WA, CT, RT and WR tags at the end of
> the file by doing this. As large sequencing runs become more common,
> iterating over the file in a single pass WITHOUT keeping everything in
> memory does seem to be desirable.
>
> Similar past discussions:
> http://portal.open-bio.org/pipermail/biopython/2004-February/001825.html
> http://portal.open-bio.org/pipermail/biopython/2005-May/002661.html
>
> Would you object to me rewording your module's header-comment not to
> say that the Ace Iterator is NOT deprecated, but rather that it has
> certain drawbacks.
>
> [The context for this is my recent thread on the Biopython dev mailing
> list about integrating your Bio.Sequencing.Ace parser into Bio.SeqIO
> and/or Bio.AlignIO - I've included a little context below.]
>
> Thanks,
>
> Peter
>
> --
>
> Peter wrote:
>
>>> So integrating the "ace" format into Bio.SeqIO representing the
>>> consensus sequence of each contig as a SeqRecord would be useful.
>>> Initially I would try and represent the aligned reads as SeqFeature
>>> objects (much like when reading a genome from a GenBank file you get
>>> CDS features with their amino acid translation).
>>>
>>> Note that for memory reasons, I would be inclined to scan over the Ace
>>> file in one pass (using the existing Iterator in the
>>> Bio.Sequencing.Ace parser) returning SeqRecords as we go. As Frank
>>> points out in the code comments, this means we can't easily include
>>> the WA, CT, RT and WR tags found in the Ace file footer. Do you use
>>> this information Jose?
>>>
>
> Jose replied,
>
>> I haven't used the iterator because of the deprecation warning of the code. I
>> tried with about 40000 alignments and it worked in a computer with 8 GB of ram.
>> I there are more sequences, and there will be with the 454 sequencer, we will
>> have trouble reading all at once. I vote for the iterator approach. I have not
>> used the information of this tag, but I don't know also what they mean. I've
>> been looking for documentation about this format, but I've found none, do you
>> have any good ace documentation?
>>
>
>
From biopython at maubp.freeserve.co.uk Tue Jun 17 09:53:23 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Jun 2008 14:53:23 +0100
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <4857B6DA.9040309@biologie.uni-kl.de>
References: <320fb6e00806170216k12ecd88fof60758db1ccec3cf@mail.gmail.com>
<4857B6DA.9040309@biologie.uni-kl.de>
Message-ID: <320fb6e00806170653g482b104fl739107fcada06dc8@mail.gmail.com>
On Tue, Jun 17, 2008 at 2:06 PM, Frank Kauff wrote:
> Hi Peter,
>
> makes totally sense to me. Feel free to do the changes as you see it fit
>
> Frank
Thanks Frank.
I've checked in some comment changes to both Ace.py and Phd.py, aimed
at both improving the documentation and trying and make epydoc happier
for the automatic API documentation:
http://biopython.org/DIST/docs/api/
Peter
P.S. I also added an __iter__ method to the Ace Iterator (Phd already had one).
From mjldehoon at yahoo.com Tue Jun 17 10:08:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 17 Jun 2008 07:08:31 -0700 (PDT)
Subject: [Biopython-dev] Iterating over Ace contig files
In-Reply-To: <320fb6e00806170653g482b104fl739107fcada06dc8@mail.gmail.com>
Message-ID: <399611.60966.qm@web62415.mail.re1.yahoo.com>
Note that bug #2454 also pertains to the Ace and Phd parsers. If you are modifying the Ace and Phd parsers, can you fix this bug at the same time?
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
--Michiel.
Peter wrote: On Tue, Jun 17, 2008 at 2:06 PM, Frank Kauff wrote:
> Hi Peter,
>
> makes totally sense to me. Feel free to do the changes as you see it fit
>
> Frank
Thanks Frank.
I've checked in some comment changes to both Ace.py and Phd.py, aimed
at both improving the documentation and trying and make epydoc happier
for the automatic API documentation:
http://biopython.org/DIST/docs/api/
Peter
P.S. I also added an __iter__ method to the Ace Iterator (Phd already had one).
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From bugzilla-daemon at portal.open-bio.org Tue Jun 17 10:43:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Jun 2008 10:43:42 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806171443.m5HEhgua005645@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-17 10:43 EST -------
I've removed the strict file-like test in:
Bio/Sequencing/Ace.py revision: 1.12
Bio/Sequencing/Phd.py revision: 1.6
In these cases, the handle is immediately turned into an UndoHandle which will
be able to check for a sufficiently file like object.
Hopefully that's what you meant Michiel - we could go further and introduce a
parse() function and deprecate the Iterator objects in these modules.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 18 06:34:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 06:34:43 -0400
Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast
output
In-Reply-To:
Message-ID: <200806181034.m5IAYhS1026214@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2503
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |INVALID
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-18 06:34 EST -------
I'm closing this bug as "INVALID" due to a lack of information.
If you are still having trouble Prashantha, and can give us some more
information, please re-open this bug.
Thank you.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 18 07:34:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 07:34:26 -0400
Subject: [Biopython-dev] [Bug 2497] Unit tests do not cover
Bio.Blast.NCBIWWW.qblast()
In-Reply-To:
Message-ID: <200806181134.m5IBYQjC032061@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2497
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-18 07:34 EST -------
I checked in a slightly revised version of this as test_NCBI_qblast.py -
marking this bug as fixed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jun 18 08:01:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 08:01:11 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200806181201.m5IC1BxA001255@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-18 08:01 EST -------
Created an attachment (id=946)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=946&action=view)
Patch to Bio/Blast/NCBIStandalone.py and Tests/test_NCBIStandalone.py
Suggested patch for the command injection risk.
Can anyone think of a legitimate reason for a ; or & character in the
parameters of a BLAST command line? This patch is very simple and will reject
any keyword parameter containing the ; or && characters.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jun 18 10:00:56 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jun 2008 15:00:56 +0100
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
<47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EB8E.3000700@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
Message-ID: <320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
This is returning to a thread last year, about getting a SeqRecord
into a string in a particular file format (e.g. fasta). Jared Flatow
had suggest adding a method to the SeqRecord itself.
Jared wrote:
> > ... To always have to write to a file feels strange, but I see
> > that it would be messy to go OO since there are so many formats.
> > However, giving preference to fasta over other formats by making it
> > innate doesn't seem like such a terrible idea. I do have mixed
> > feelings about 'bloating' the code which is why I asked, and you have
> > convinced me that this is not quite appropriate given existing
> > convention. However the idea would be to put the to_fasta or
> > to_format method inside the SeqRecord, then to call it from the IO
> > when needed to actually write to a file, but call it directly when
> > all that is wanted is a string...
>
> Its debatable isn't it? I suspect that for most users, when they want a
> record in a particular file format its for writing to a file. However,
> adding a to_format() method to a SeqRecord some sense (suitable for
> sequential file formats only). This would take a format name and return
> a string, by calling Bio.SeqIO with a StringIO object internally.
>
> Peter
Jared - On reflection, do you think adding a method like this to the
SeqRecord (or even just for the FASTA format) would be useful?
I recently found myself wanting to use this sort of functionality, and
remembered this old thread. This time I was wondering about using the
method name tostring (matching the name of a Seq object method). In
order to mimic the Seq object's method, the format would be optional
and when omitted would give the sequence as a string. Otherwise one
of the lower case strings used in Bio.SeqIO should be supplied. There
is a sample implementation at the end of this email.
?
On Wed, Oct 17, 2007 Michiel De Hoon wrote:
> How about the following:
>
> SeqIO.write(sequences, handle, format) returns the properly formatted string
> if handle==None.
I can see the above is simpler than having to supply a StringIO
handle, but it doesn't make the functionality available directly from
the SeqRecord object. It also complicates the API of the SeqIO module
with a special case.
Peter
--
######################################
For the SeqRecord class, in Bio/SeqRecord.py
######################################
def tostring(self, format=None) :
"""Returns the record as a string in the specified file format.
If the file format is omitted (default), the sequence itself is
returned as a string.
Otherwise the format should be a lower case string supported by
Bio.SeqIO, which is used to turn the SeqRecord into a string."""
if format :
from StringIO import StringIO
from Bio import SeqIO
handle = StringIO()
SeqIO.write([self], handle, format)
handle.seek(0)
return handle.read()
else :
#Return the sequence as a string
return self.seq.tostring()
############################################
From jflatow at northwestern.edu Wed Jun 18 11:25:18 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Wed, 18 Jun 2008 10:25:18 -0500
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
<47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EB8E.3000700@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
<4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
Message-ID: <55567F98-C5F5-4A2F-8542-502F17F485E9@northwestern.edu>
Quick correction:
On Jun 18, 2008, at 10:16 AM, Jared Flatow wrote:
> Hi Peter,
>
> On Jun 18, 2008, at 9:00 AM, Peter wrote:
>
>> Jared - On reflection, do you think adding a method like this to the
>> SeqRecord (or even just for the FASTA format) would be useful?
>
> Yes I still think so. In fact, for sequences, I would say that I
> pretty much never deal with a format ever than FASTA, so even making
> the __str__ method of SeqRecord return the FASTA format as well
> seems reasonable, though perhaps my use cases are different than
> others.
>
> However, py3k and 2.6 will make available the functionality
> described in PEP 3101:
>
> http://www.python.org/dev/peps/pep-3101/
>
> I think it would be best to define some semantics that are
> compatible with this PEP. This would basically mean using the
> __format__ method (which could be the same as the tostring method
> you have defined below). To achieve backward compatibility and/or a
> more OO interface, tostring could just be an alias for __format__.
> Thus, instead of calling format(seq_rec, 'fasta') one could call
> seq_rec.tostring('fasta') and these would be equivalent. The PEP
> also states that format(seq_rec) should be the same as str(seq_rec).
On second thought it seems like a .format method (similar to the one
the string class is acquiring) should be used as an alias to
__format__ (somehow I think tostring should always be the same as
__str__)
> In short, I think creating methods to return formatted versions of
> objects (SeqRecords) is a good idea, but most especially if it is
> done in a way consistent with the language's vision.
>
> Best,
> jared
From bugzilla-daemon at portal.open-bio.org Wed Jun 18 11:36:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Jun 2008 11:36:48 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200806181536.m5IFamvB015695@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp 2008-06-18 11:36 EST -------
(In reply to comment #15)
> I've removed the strict file-like test in:
>
> Bio/Sequencing/Ace.py revision: 1.12
> Bio/Sequencing/Phd.py revision: 1.6
>
> In these cases, the handle is immediately turned into an UndoHandle which will
> be able to check for a sufficiently file like object.
>
> Hopefully that's what you meant Michiel
Actually, I think we should avoid using an UndoHandle altogether, now that
Python has generator functions.
> - we could go further and introduce a
> parse() function and deprecate the Iterator objects in these modules.
>
That would make things a lot easier. An Iterator class was useful in older
versions of Python, but generator functions provide a cleaner alternative.
In Ace.py, we'd need three functions:
1) read(handle), which returns one record (Contig) read from the handle, and
None otherwise;
2) parse(handle), a generator function returning an iterator over the records;
3) a local function _process_line(line, record)
These functions then look like this:
def read(handle):
record = None
for line in handle:
if line[:2]=='CO':
break
else:
return None
record = Contig()
for line in handle:
if line[:2]=='CO':
return record
else:
_process_line(line, record)
def parse(handle):
record = None
for line in handle:
if line[:2]=='CO':
if record:
yield record
record = Contig()
_process_line(line, record)
if record:
return record
The actual work is done in _process_line.
So we don't need to store the read lines explicitly; this is now taken care of
by the generator function. Hence, we don't need to convert the handle to an
UndoHandle. In addition, handle can now also be a list of lines instead of a
file handle. In this respect, I think Zachary was right in comment #11:
> Maybe it's a good idea for any parsers/iterators to just
> use the iterator-like ability of file handles?
In other words, as long as we can pull lines from the handle, we can parse it.
In Phd.py, it's even simpler. Here, we only need the read() and parse()
function:
def read(handle):
for line in handle:
if line.startswith("BEGIN_SEQUENCE"):
record = Record()
elif line.startswith("END_SEQUENCE"):
return record
else:
# do the actual processing of the other lines here
def parse(handle):
while True:
record = read(handle)
if not record:
return
yield record
Again, we can process each line just as they come along. No UndoHandle, no
Parser, no Consumer, no Scanner needed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Wed Jun 18 11:16:59 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Wed, 18 Jun 2008 10:16:59 -0500
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
<47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EB8E.3000700@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
<320fb6e00806180700k327e6913m7ba9c4bdc3421f67@mail.gmail.com>
Message-ID: <4D53AB82-F673-4F4F-BCEC-BA06088E8721@northwestern.edu>
Hi Peter,
On Jun 18, 2008, at 9:00 AM, Peter wrote:
> Jared - On reflection, do you think adding a method like this to the
> SeqRecord (or even just for the FASTA format) would be useful?
Yes I still think so. In fact, for sequences, I would say that I
pretty much never deal with a format ever than FASTA, so even making
the __str__ method of SeqRecord return the FASTA format as well seems
reasonable, though perhaps my use cases are different than others.
However, py3k and 2.6 will make available the functionality described
in PEP 3101:
http://www.python.org/dev/peps/pep-3101/
I think it would be best to define some semantics that are compatible
with this PEP. This would basically mean using the __format__ method
(which could be the same as the tostring method you have defined
below). To achieve backward compatibility and/or a more OO interface,
tostring could just be an alias for __format__. Thus, instead of
calling format(seq_rec, 'fasta') one could call
seq_rec.tostring('fasta') and these would be equivalent. The PEP also
states that format(seq_rec) should be the same as str(seq_rec).
In short, I think creating methods to return formatted versions of
objects (SeqRecords) is a good idea, but most especially if it is done
in a way consistent with the language's vision.
Best,
jared
From yair.benita at gmail.com Wed Jun 18 13:26:02 2008
From: yair.benita at gmail.com (Yair Benita)
Date: Wed, 18 Jun 2008 13:26:02 -0400
Subject: [Biopython-dev] BioPax parser
Message-ID:
Hi Guys,
Does anyone have a biopax parser written in python?
Thanks,
Yair
From biopython at maubp.freeserve.co.uk Wed Jun 18 13:42:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jun 2008 18:42:13 +0100
Subject: [Biopython-dev] BioPax parser
In-Reply-To:
References:
Message-ID: <320fb6e00806181042y169f580epbd8c876eb3cb57fa@mail.gmail.com>
On Wed, Jun 18, 2008 at 6:26 PM, Yair Benita wrote:
> Hi Guys,
> Does anyone have a biopax parser written in python?
> Thanks,
> Yair
I don't know of any (but I haven't searched). From a quick look on
www.biopax.org they use XML, so you should be able to parse it in
python fairly easily - but I guess some sort of object orientated
representation of the data would be very nice to have.
Peter
From bugzilla-daemon at portal.open-bio.org Thu Jun 19 06:08:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:08:55 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200806191008.m5JA8t0v016495@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-19 06:08 EST -------
On the issue of the low-complexity filter, that is actually already supported
in NCBIStandalone.blastall(), NCBIStandalone.blastpgp() and
NCBIStandalone.rpsblast() using the optional argument 'filter'. This is
described in the doc string too, although it doesn't use the phrase "low
complexity" which might be clearer.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 19 06:20:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:20:03 -0400
Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs
urgent optimization
In-Reply-To:
Message-ID: <200806191020.m5JAK3OZ017201@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2494
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-19 06:20 EST -------
I'm marking this as fixed now, but if anyone does find an issue with it please
re-open the bug. Thanks for your work on this Eric.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jun 19 06:41:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Jun 2008 06:41:22 -0400
Subject: [Biopython-dev] [Bug 2408] GenBank records do not contain U's
In-Reply-To:
Message-ID: <200806191041.m5JAfMNK018058@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2408
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-06-19 06:41 EST -------
Given there were no other opinions voiced on how to handle this, I went ahead
and fixed this in Bio/GenBank/__init__.py CVS revision 1.83
For records from RNA, if the sequence contains T but not U, we will use a DNA
alphabet in the Seq object.
Thanks for raising this Marcin.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Thu Jun 19 09:04:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 19 Jun 2008 06:04:31 -0700 (PDT)
Subject: [Biopython-dev] Bio.CDD, anyone?
Message-ID: <14893.84074.qm@web62409.mail.re1.yahoo.com>
Hi everybody,
Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database) records. The parser parses HTML pages from CDD's web site. Since the parser was written about six years ago, the CDD web site has changed considerably. Bio.CDD therefore cannot parse current HTML pages from CDD.
So I am wondering:
1) Is anybody using Bio.CDD?
2) Is anybody willing to update Bio.CDD to handle current HTML?
3) If not, can we deprecate it? There is not much purpose of having a parser for HTML pages from years ago.
--Michiel.
From biopython at maubp.freeserve.co.uk Thu Jun 19 09:38:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Jun 2008 14:38:29 +0100
Subject: [Biopython-dev] Bio.CDD, anyone?
In-Reply-To: <14893.84074.qm@web62409.mail.re1.yahoo.com>
References: <14893.84074.qm@web62409.mail.re1.yahoo.com>
Message-ID: <320fb6e00806190638y2e3729e1ga66561de0c962700@mail.gmail.com>
> Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database)
> records. The parser parses HTML pages from CDD's web site. Since the parser
> was written about six years ago, the CDD web site has changed considerably.
> Bio.CDD therefore cannot parse current HTML pages from CDD.
A couple of years ago, I wanted to get the CDD domain name and
description and ended up writing my own very simple and crude parser
to extract just this information. Doing a proper job would mean
extracting lots and lots of fields, e.g.
http://www.nc