[Biopython] Regarding Bio.SearchIO Hmmer Use Case

Wed Aug 5 04:00:01 UTC 2015

Hello,

Biopython & SearchIO user for over three weeks.

OS - Ubuntu 14,04
Biopython Bio.__version__ 1.65
HMMER -- 3.1b2
Code is *here <https://github.com/gprakhar/scripts-biopython/tree/alpha>*
(github) - branch alpha

Briefly, if a hmmer 3 text output is parsed using the 'SearchIO.parse()'
and that file has output from more than one searches only the first result
is read and rest are ignored.

Details -- Used 'hmmsearch' with pfam models against a database of
proteins. Since I wanted to club some domain models together, hence
concatenated the result into a single text file. As all the domains
concatenated together belong to a single protein.

When I use SearchIO.parse() to read this file, the Queryresult object has
only the top entry (first hmm models result), iterating over the
Queryresult does not give the other results.

To test this I have split the results file to have the results of a
single hmmsearch run in each file, after this parse works fine.
But this use case (single result in one file) is for 'SearchIO.read()'.
Consequently I would expect 'SearchIO.parse()' to be able to parse multiple
results from a single file.

As test case from my repo
<https://github.com/gprakhar/scripts-biopython/tree/alpha>, first run the
script 'hmmer-SearchIO-text-parser.py', the output file 'Q9UP38.out' would
be read
 this represents the use case in which the result file has multiple (2)
entries.

Are my expectations from 'SearchIO.parse()' correct ? Or am I missing
something ?

Reply, Comments, Pointers etc. are all welcome.

-- 
Thanking you,
-- 
Prakhar Gaur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150805/1b9d17ac/attachment.html>