[Biopython] Regarding Bio.SearchIO Hmmer Use Case

Wibowo Arindrarto w.arindrarto at gmail.com
Wed Aug 5 07:07:16 UTC 2015


Hi Prakhar,

Thanks for uploading your script and test case, first of all. It helps a lot :).

As for your question, indeed I saw that your output is a concatenation
of two hmmsearch results. If you would like to parse all query
results, instead you should run hmmsearch with a single input file
containing all of your queries. In other words, instead of
concatenating the results, you should concatenate the input HMMs (a
simple `cat` should do). You will still see the results of each query
separately, and I don't expect them to influence each other.

This is because the output file begins and ends with a specific
section. When you concatenate two output files, the parser sees that
it has encountered the end section of an output file and stops
parsing. If you run two input HMMs instead, the parser can then see
all the results before encountering the output file's end section.

Hope this helps :),
Bow

On Wed, Aug 5, 2015 at 6:00 AM, Prakhar Gaur <prakhar.aaidu16 at gmail.com> wrote:
> Hello,
>
> Biopython & SearchIO user for over three weeks.
>
> OS - Ubuntu 14,04
> Biopython Bio.__version__ 1.65
> HMMER -- 3.1b2
> Code is here (github) - branch alpha
>
> Briefly, if a hmmer 3 text output is parsed using the 'SearchIO.parse()' and
> that file has output from more than one searches only the first result is
> read and rest are ignored.
>
> Details -- Used 'hmmsearch' with pfam models against a database of proteins.
> Since I wanted to club some domain models together, hence concatenated the
> result into a single text file. As all the domains concatenated together
> belong to a single protein.
>
> When I use SearchIO.parse() to read this file, the Queryresult object has
> only the top entry (first hmm models result), iterating over the Queryresult
> does not give the other results.
>
> To test this I have split the results file to have the results of a single
> hmmsearch run in each file, after this parse works fine.
> But this use case (single result in one file) is for 'SearchIO.read()'.
> Consequently I would expect 'SearchIO.parse()' to be able to parse multiple
> results from a single file.
>
> As test case from my repo, first run the script
> 'hmmer-SearchIO-text-parser.py', the output file 'Q9UP38.out' would be read
>  this represents the use case in which the result file has multiple (2)
> entries.
>
> Are my expectations from 'SearchIO.parse()' correct ? Or am I missing
> something ?
>
> Reply, Comments, Pointers etc. are all welcome.
>
> --
> Thanking you,
> --
> Prakhar Gaur
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list