[Bioperl-l] search for only C-terminal degenerate motifs

Lucas Carey lcarey at odd.bio.sunysb.edu
Sun Oct 12 00:56:10 EDT 2003


On Fri, Oct 10, 2003 at 04:08:26PM -0400, Aaron J. Mackey wrote:
> Yeah, the problem is you'd really like to *not* search all of the 
> database (for both speed and statistical reasons), only the first n 
> C-terminal residues of each sequence in the database.  Both BLAST and 
Search time isn't an issue, I'll be using mpiBLAST on a 128cpu cluster. I don't think the statistics will matter, because I'm just looking for genes to test biologically, not to draw any conclusions from the db search alone.

> But you said "motif" - are you trying to find:
> 
>   a) exact matches to a given short sequence
exact matches to one of ~7 sequences located at : (C-3) - (C-2) - (C-1) - C
The query file would just be a FASTA file with 7 4aa query sequences.

>   b) exact matches to a consensus regular expression (e.g.: CX[S|T]C)
is this possible? can i search for [R|K][D|E] if I wanted to search for a negativly charged aa that follows a postitivly charged one?

My thought, because I'm fairly competent with perl but have never used bioperl before, was to do this:

>gi|17647257|ref|NP_523617.1|   Chitinase 1 [Drosophila melanogaster]
 gi|37078008|sp|Q9W5U3|CHI1_DROME   Probable chitinase 1
            Length = 508

				Score = 14.6 bits (27), Expect =  9048
				Identities = 4/4 (100%), Positives = 4/4 (100%)

				Query: 1   AGDK 4
				AGDK
				Sbjct: 226 AGDK 229

If my signal sequence is AGDK. I want to look for matches where XXX in
'Length = XXX' is equal to XXX in 'Sbjct: nnn AGDK XXX'.

I would not mind using this as an excuse to learn bioperl, assuming that there is a reasonable straight forward way to go about doing this in bioperl. In the BioSearch::IO HOWTO I see end('hit') and length('query') but no length('db_sequence') or something like that. Does this method exist?
I could do 
if( (end('hit') == length('db_sequence')) && (matches('query')[0] == 4)) { print 	query_description ;}
-Lucas
thank you for your assistance



More information about the Bioperl-l mailing list