[Biopython] Regarding Bio.SearchIO Hmmer Use Case

Wed Aug 12 11:54:25 UTC 2015

Hmmer mailing list - this for your record. SOLVED.

Hi Bow,

Quick update,
the methodology you suggested works fine.
SearchIO 's hmmer parser is able to parse the results from a single output
file containing results from multiple hmm models.

Thank you for the help.

Regards,
--
prakhar

On Wed, Aug 5, 2015 at 10:27 PM, Prakhar Gaur <prakhar.aaidu16 at gmail.com>
wrote:

> Hi Bow,
>
> Thank you for the prompt reply.
>
> Now I am clear about what was going wrong, will rerun the Hmmer analysis
> with the hmm model files concatenated.
> I am travelling now and since this analysis was run on our Institute's
> HPC, I would have have to wait till next week to test this out and reply.
>
> Regards,
> --
> prakhar
>
> On Wed, Aug 5, 2015 at 12:37 PM, Wibowo Arindrarto <w.arindrarto at gmail.com
> > wrote:
>
>> Hi Prakhar,
>>
>> Thanks for uploading your script and test case, first of all. It helps a
>> lot :).
>>
>> As for your question, indeed I saw that your output is a concatenation
>> of two hmmsearch results. If you would like to parse all query
>> results, instead you should run hmmsearch with a single input file
>> containing all of your queries. In other words, instead of
>> concatenating the results, you should concatenate the input HMMs (a
>> simple `cat` should do). You will still see the results of each query
>> separately, and I don't expect them to influence each other.
>>
>> This is because the output file begins and ends with a specific
>> section. When you concatenate two output files, the parser sees that
>> it has encountered the end section of an output file and stops
>> parsing. If you run two input HMMs instead, the parser can then see
>> all the results before encountering the output file's end section.
>>
>> Hope this helps :),
>> Bow
>>
>> On Wed, Aug 5, 2015 at 6:00 AM, Prakhar Gaur <prakhar.aaidu16 at gmail.com>
>> wrote:
>> > Hello,
>> >
>> > Biopython & SearchIO user for over three weeks.
>> >
>> > OS - Ubuntu 14,04
>> > Biopython Bio.__version__ 1.65
>> > HMMER -- 3.1b2
>> > Code is here (github) - branch alpha
>> >
>> > Briefly, if a hmmer 3 text output is parsed using the
>> 'SearchIO.parse()' and
>> > that file has output from more than one searches only the first result
>> is
>> > read and rest are ignored.
>> >
>> > Details -- Used 'hmmsearch' with pfam models against a database of
>> proteins.
>> > Since I wanted to club some domain models together, hence concatenated
>> the
>> > result into a single text file. As all the domains concatenated together
>> > belong to a single protein.
>> >
>> > When I use SearchIO.parse() to read this file, the Queryresult object
>> has
>> > only the top entry (first hmm models result), iterating over the
>> Queryresult
>> > does not give the other results.
>> >
>> > To test this I have split the results file to have the results of a
>> single
>> > hmmsearch run in each file, after this parse works fine.
>> > But this use case (single result in one file) is for 'SearchIO.read()'.
>> > Consequently I would expect 'SearchIO.parse()' to be able to parse
>> multiple
>> > results from a single file.
>> >
>> > As test case from my repo, first run the script
>> > 'hmmer-SearchIO-text-parser.py', the output file 'Q9UP38.out' would be
>> read
>> >  this represents the use case in which the result file has multiple (2)
>> > entries.
>> >
>> > Are my expectations from 'SearchIO.parse()' correct ? Or am I missing
>> > something ?
>> >
>> > Reply, Comments, Pointers etc. are all welcome.
>> >
>> > --
>> > Thanking you,
>> > --
>> > Prakhar Gaur
>> >
>> >
>> > _______________________________________________
>> > Biopython mailing list  -  Biopython at mailman.open-bio.org
>> > http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>
>
>
> --
> Thanking you,
> --
> Prakhar Gaur
>
>

-- 
Thanking you,
-- 
Prakhar Gaur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150812/d9bdb031/attachment.html>