[Biopython-dev] Working with the new SearchIO API
Wibowo Arindrarto
w.arindrarto at gmail.com
Thu Nov 1 08:19:58 UTC 2012
Hi Kai, Michiel,
(I hope this gets through to the mailing list. I'm CC-ing several
people in the discussion as well, just in case).
I've made a new branch based on Kai's SearchIO rebase here:
https://github.com/bow/biopython/tree/searchio-rebase, with the
following important changes:
>>Does anyone have preference between '.acc' or '.accession'? If not, I
>>can change the current '.acc' into '.accession'.
>
> I would prefer .accession for clarity.
1. All accession attributes now use the 'accession' name
(https://github.com/bow/biopython/commit/002b08df91040e6bcf3f0dd3d087b3d378005632).
There's a similar attribute from blast-tab, which is the accession
number and its version. This has also been renamed from 'acc_ver' to
'accession_version'. The docs have been updated accordingly.
> See the attached hmmpfam output. You'll notice that the domain table
> is not in the order of the hit table. As I'd like to preserve the
> order of the hit table, the current setup of the API forces me to
> either repeatedly parse the domain annotations until I find the
> correct domain annotations for my hit, or to create the hits in the
> order of the domain annotation table and then reshuffle them to make
> sure they're in the order of the hit table.
>
> If I could just create "empty" hit objects when parsing the hit table,
> I could easily preserve the order of the hits but still add the hsps
> as I parse them.
2. Regarding the Hit object API change, I've changed it so that Hit
objects can now be created without any HSPs
(https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4).
However, per my explanation about keeping as few places possible to
store the same value (in this case the hit and query ID and
description), the empty Hit object will raise errors if any of these
attributes are accessed. Setting and getting these attributes will
only work if there is at least one HSP in the Hit. Other Hit
functions, like append, should work ok as long as it doesn't involve
accessing these attributes. I think this will allow parsing of file
formats like HMMER2 plain text while maintaining the attribute storage
constraint.
Hope these help :).
regards,
Bow
More information about the Biopython-dev
mailing list