[Biopython-dev] NCBIStandalone Blast HSP parsing

Yair Benita y.benita at wanadoo.nl
Mon Oct 17 19:45:47 EDT 2005


Hi Michael,
This issue has already been fixed. In the last review of NCBIstandalone I
made with Jeff Chang the query_end and sbjct_end were added.
Just grab the latest NCBIstandalone version from CVS.

Yair

> From: Mark Hoebeke <Mark.Hoebeke at jouy.inra.fr>
> Organization: INRA - MIA
> Date: Mon, 17 Oct 2005 16:07:13 +0200
> To: <biopython-dev at biopython.org>
> Subject: [Biopython-dev] NCBIStandalone Blast HSP parsing
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi all,
> 
> I wanted a quick and easy way to determine the endpoints of HSPs extraced from
> Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks
> the
> query_end and sbjct_end attributes. Googling around led me to a recipe
> describing how to compute the endpoint using the total length, gap length and
> other niceties. Not exactly intuitive to me.
> 
> Hence I dove into the NCBIStandalone and HSP modules and made some slight
> modifications. Basically I added the two attributes to HSP and the following
> snippets to NCBIStandalone (release 1.4b):
> 
> 972c972
> <     _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)")
> - ---
>>     _query_re = re.compile(r"Query: (\d+)\s*(.+) \d")
> 977,978c977
> <         start, seq, end = m.groups()
> <       self._hsp.query_end=string.atoi(end);
> - ---
>>         start, seq = m.groups()
> 997,998c996,997
> <         start, seq, end = _re_search(
> <             r"Sbjct: (\d+)\s*(.+) (\d+)", line,
> - ---
>>         start, seq = _re_search(
>>             r"Sbjct: (\d+)\s*(.+) \d", line,
> 1014c1013
> <       self._hsp.sbjct_end=string.atoi(end)
> - ---
>> 
> 
> Looks to easy to be true, I thought. Now sorry if I'm missing some important
> issues here (I'm quite new to BioPython), but is there a reason no one has
> made
> this patch yet ?
> 
> Thanks for any comments (flames and others.)
> 
> Cheers,
> 
> Mark
> 
> 
> - --
> - ----------------------------Mark.Hoebeke at jouy.inra.fr-----------------------
> Unité Statistique & Génome    _/_/_/    _/_/_/  http://stat.genopole.cnrs.fr
> Tél : +33 (0)1 60 87 38 03  _/        _/          Fax : +33 (0)1 60 87 38 09
> Tour Evry 2,                 _/_/    _/  _/_/         523, pl. des Terrasses
> F-91000,                        _/  _/    _/                            Evry
> PGP : A2AD52E3           _/_/_/      _/_/_/
> 
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
> 
> iD8DBQFDU7ARa3nTV6KtUuMRArBqAKC/m4i+VpVaU3clvOkMuYkfRrZQ+QCfbRKg
> gBBW5wNKS3sb/Uqr31eumx8=
> =vSWV
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
> 





More information about the Biopython-dev mailing list