[Bioperl-l] how to remove indentical sequences from a dataset
Shaohua Fan
lengjingmao at gmail.com
Tue Aug 5 03:36:23 EDT 2008
Hi, there ,
I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules which can remove those identical sequences?
thanks a lot.
yours,
shaohua
----- Original Message -----
From: "Benbo" <btemperton at googlemail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Sunday, August 03, 2008 4:05 AM
Subject: [Bioperl-l] Finding possible primers regex
>
> Hi there,
> I'm trying to write a perl script to scan an aligned multiple entry fasta
> file and find possible primers. So far I've produced a string which contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
>
> would give TTA*C**AA.
>
> I want to parse this string and pull out all sequences which are 18-21 bp in
> length and have no more than 4 * in them.
>
> So far, I've got this:
>
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
>
> hoping to match all fragments 18-21 characters in length. However even that
> doesn't work as it has essentially chunked it into 21 char blocks, rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
>
> etc.
>
> Can anyone let me know if this is already possible in BioPerl, or how one
> would go about it with regex. Sadly I'm fairly new to perl and getting to
> grips with BioPerl, so please treat me gently :).
>
> Many thanks,
>
> Ben
>
>
>
> --
> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list