[Biopython-dev] NCBIStandalone Blast HSP parsing
Mark Hoebeke
Mark.Hoebeke at jouy.inra.fr
Mon Oct 17 10:07:13 EDT 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
I wanted a quick and easy way to determine the endpoints of HSPs extraced from
Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks the
query_end and sbjct_end attributes. Googling around led me to a recipe
describing how to compute the endpoint using the total length, gap length and
other niceties. Not exactly intuitive to me.
Hence I dove into the NCBIStandalone and HSP modules and made some slight
modifications. Basically I added the two attributes to HSP and the following
snippets to NCBIStandalone (release 1.4b):
972c972
< _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)")
- ---
> _query_re = re.compile(r"Query: (\d+)\s*(.+) \d")
977,978c977
< start, seq, end = m.groups()
< self._hsp.query_end=string.atoi(end);
- ---
> start, seq = m.groups()
997,998c996,997
< start, seq, end = _re_search(
< r"Sbjct: (\d+)\s*(.+) (\d+)", line,
- ---
> start, seq = _re_search(
> r"Sbjct: (\d+)\s*(.+) \d", line,
1014c1013
< self._hsp.sbjct_end=string.atoi(end)
- ---
>
Looks to easy to be true, I thought. Now sorry if I'm missing some important
issues here (I'm quite new to BioPython), but is there a reason no one has made
this patch yet ?
Thanks for any comments (flames and others.)
Cheers,
Mark
- --
- ----------------------------Mark.Hoebeke at jouy.inra.fr-----------------------
Unité Statistique & Génome _/_/_/ _/_/_/ http://stat.genopole.cnrs.fr
Tél : +33 (0)1 60 87 38 03 _/ _/ Fax : +33 (0)1 60 87 38 09
Tour Evry 2, _/_/ _/ _/_/ 523, pl. des Terrasses
F-91000, _/ _/ _/ Evry
PGP : A2AD52E3 _/_/_/ _/_/_/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDU7ARa3nTV6KtUuMRArBqAKC/m4i+VpVaU3clvOkMuYkfRrZQ+QCfbRKg
gBBW5wNKS3sb/Uqr31eumx8=
=vSWV
-----END PGP SIGNATURE-----
More information about the Biopython-dev
mailing list