[Biopython-dev] NCBIStandalone Blast HSP parsing
    Mark Hoebeke 
    Mark.Hoebeke at jouy.inra.fr
       
    Mon Oct 17 10:07:13 EDT 2005
    
    
  
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
I wanted a quick and easy way to determine the endpoints of HSPs extraced from
Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks the
query_end and sbjct_end attributes. Googling around led me to a recipe
describing how to compute the endpoint using the total length, gap length and
other niceties. Not exactly intuitive to me.
Hence I dove into the NCBIStandalone and HSP modules and made some slight
modifications. Basically I added the two attributes to HSP and the following
snippets to NCBIStandalone (release 1.4b):
972c972
<     _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)")
- ---
>     _query_re = re.compile(r"Query: (\d+)\s*(.+) \d")
977,978c977
<         start, seq, end = m.groups()
<       self._hsp.query_end=string.atoi(end);
- ---
>         start, seq = m.groups()
997,998c996,997
<         start, seq, end = _re_search(
<             r"Sbjct: (\d+)\s*(.+) (\d+)", line,
- ---
>         start, seq = _re_search(
>             r"Sbjct: (\d+)\s*(.+) \d", line,
1014c1013
<       self._hsp.sbjct_end=string.atoi(end)
- ---
>
Looks to easy to be true, I thought. Now sorry if I'm missing some important
issues here (I'm quite new to BioPython), but is there a reason no one has made
this patch yet ?
Thanks for any comments (flames and others.)
Cheers,
Mark
- --
- ----------------------------Mark.Hoebeke at jouy.inra.fr-----------------------
Unité Statistique & Génome    _/_/_/    _/_/_/  http://stat.genopole.cnrs.fr
Tél : +33 (0)1 60 87 38 03  _/        _/          Fax : +33 (0)1 60 87 38 09
Tour Evry 2,                 _/_/    _/  _/_/         523, pl. des Terrasses
F-91000,                        _/  _/    _/                            Evry
PGP : A2AD52E3           _/_/_/      _/_/_/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDU7ARa3nTV6KtUuMRArBqAKC/m4i+VpVaU3clvOkMuYkfRrZQ+QCfbRKg
gBBW5wNKS3sb/Uqr31eumx8=
=vSWV
-----END PGP SIGNATURE-----
    
    
More information about the Biopython-dev
mailing list