[Biopython] Regarding Bio.SearchIO Hmmer Use Case

Prakhar Gaur prakhar.aaidu16 at gmail.com
Wed Aug 5 16:57:54 UTC 2015


Hi Bow,

Thank you for the prompt reply.

Now I am clear about what was going wrong, will rerun the Hmmer analysis
with the hmm model files concatenated.
I am travelling now and since this analysis was run on our Institute's HPC,
I would have have to wait till next week to test this out and reply.

Regards,
--
prakhar

On Wed, Aug 5, 2015 at 12:37 PM, Wibowo Arindrarto <w.arindrarto at gmail.com>
wrote:

> Hi Prakhar,
>
> Thanks for uploading your script and test case, first of all. It helps a
> lot :).
>
> As for your question, indeed I saw that your output is a concatenation
> of two hmmsearch results. If you would like to parse all query
> results, instead you should run hmmsearch with a single input file
> containing all of your queries. In other words, instead of
> concatenating the results, you should concatenate the input HMMs (a
> simple `cat` should do). You will still see the results of each query
> separately, and I don't expect them to influence each other.
>
> This is because the output file begins and ends with a specific
> section. When you concatenate two output files, the parser sees that
> it has encountered the end section of an output file and stops
> parsing. If you run two input HMMs instead, the parser can then see
> all the results before encountering the output file's end section.
>
> Hope this helps :),
> Bow
>
> On Wed, Aug 5, 2015 at 6:00 AM, Prakhar Gaur <prakhar.aaidu16 at gmail.com>
> wrote:
> > Hello,
> >
> > Biopython & SearchIO user for over three weeks.
> >
> > OS - Ubuntu 14,04
> > Biopython Bio.__version__ 1.65
> > HMMER -- 3.1b2
> > Code is here (github) - branch alpha
> >
> > Briefly, if a hmmer 3 text output is parsed using the 'SearchIO.parse()'
> and
> > that file has output from more than one searches only the first result is
> > read and rest are ignored.
> >
> > Details -- Used 'hmmsearch' with pfam models against a database of
> proteins.
> > Since I wanted to club some domain models together, hence concatenated
> the
> > result into a single text file. As all the domains concatenated together
> > belong to a single protein.
> >
> > When I use SearchIO.parse() to read this file, the Queryresult object has
> > only the top entry (first hmm models result), iterating over the
> Queryresult
> > does not give the other results.
> >
> > To test this I have split the results file to have the results of a
> single
> > hmmsearch run in each file, after this parse works fine.
> > But this use case (single result in one file) is for 'SearchIO.read()'.
> > Consequently I would expect 'SearchIO.parse()' to be able to parse
> multiple
> > results from a single file.
> >
> > As test case from my repo, first run the script
> > 'hmmer-SearchIO-text-parser.py', the output file 'Q9UP38.out' would be
> read
> >  this represents the use case in which the result file has multiple (2)
> > entries.
> >
> > Are my expectations from 'SearchIO.parse()' correct ? Or am I missing
> > something ?
> >
> > Reply, Comments, Pointers etc. are all welcome.
> >
> > --
> > Thanking you,
> > --
> > Prakhar Gaur
> >
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
>



-- 
Thanking you,
-- 
Prakhar Gaur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150805/ef3fbabb/attachment.html>


More information about the Biopython mailing list