[BioPython] A Blast filter

Eirik Sønneland eirik.sonneland at student.umb.no
Tue Dec 13 07:07:21 EST 2005


Hi,

I'm working on a high throughput SNP detection pipeline, are blasting ~150
000 trace sequences against contig/refseq database and a trace archive
database downloaded from NCBI.

To find SNPs We repeat mask 150 000 traces AND contig database BEFORE
Blast. This way we focus the search in the NON repeated area of the
sequence. When selecting Blast hits (HSPs) I use e-value(>=e-17), Blast
score(>=1050) and identity (>=97%). This is very stringent but since I
know my trace sequences are about 1000bp I ensure to select hits which
have a minimum of 500bp matching/aligned(Blast score) and in this
alignment "area" of the sequences there are no less then 97% identity. You
need to tune this parametres to meet your needs.

The code for this is what is explained in the cookbook for sorting
blast.records. I can send you a extract from my script if interested.

Eirik Sønneland


>  Hi all,
>
>  I wonder if one has developed any filter for blast searching against a
> local database (like nr) before performing a multiple sequence alignment
> of large number of sequences...
>
>  Any tips are pretty appreciable!
>
>  Thanks in advance!
>
>  Alessandro
>
>
>
> ---------------------------------
>  Yahoo! doce lar. Faça do Yahoo! sua homepage.
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
>




More information about the BioPython mailing list