[Bioperl-l] search for only C-terminal degenerate motifs

Aaron J Mackey ajm6q at virginia.edu
Sun Oct 12 09:58:51 EDT 2003


Since you're looking for exact matches (in a defined location, no less),
why do you need BLAST, or any bioinfomatics tool?  Doesn't simple string
comparison, or regexp matching get you what you need with minimal fuss?

-Aaron

On Sun, 12 Oct 2003, Lucas Carey wrote:

> On Fri, Oct 10, 2003 at 04:08:26PM -0400, Aaron J. Mackey wrote:
> > Yeah, the problem is you'd really like to *not* search all of the
> > database (for both speed and statistical reasons), only the first n
> > C-terminal residues of each sequence in the database.  Both BLAST and
> Search time isn't an issue, I'll be using mpiBLAST on a 128cpu cluster. I don't think the statistics will matter, because I'm just looking for genes to test biologically, not to draw any conclusions from the db search alone.
>
> > But you said "motif" - are you trying to find:
> >
> >   a) exact matches to a given short sequence
> exact matches to one of ~7 sequences located at : (C-3) - (C-2) - (C-1) - C
> The query file would just be a FASTA file with 7 4aa query sequences.
>
> >   b) exact matches to a consensus regular expression (e.g.: CX[S|T]C)
> is this possible? can i search for [R|K][D|E] if I wanted to search for a negativly charged aa that follows a postitivly charged one?
>
> My thought, because I'm fairly competent with perl but have never used bioperl before, was to do this:
>
> >gi|17647257|ref|NP_523617.1|   Chitinase 1 [Drosophila melanogaster]
>  gi|37078008|sp|Q9W5U3|CHI1_DROME   Probable chitinase 1
>             Length = 508
>
> 				Score = 14.6 bits (27), Expect =  9048
> 				Identities = 4/4 (100%), Positives = 4/4 (100%)
>
> 				Query: 1   AGDK 4
> 				AGDK
> 				Sbjct: 226 AGDK 229
>
> If my signal sequence is AGDK. I want to look for matches where XXX in
> 'Length = XXX' is equal to XXX in 'Sbjct: nnn AGDK XXX'.
>
> I would not mind using this as an excuse to learn bioperl, assuming that there is a reasonable straight forward way to go about doing this in bioperl. In the BioSearch::IO HOWTO I see end('hit') and length('query') but no length('db_sequence') or something like that. Does this method exist?
> I could do
> if( (end('hit') == length('db_sequence')) && (matches('query')[0] == 4)) { print 	query_description ;}
> -Lucas
> thank you for your assistance
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey at virginia.edu




More information about the Bioperl-l mailing list