[Bioperl-l] Scripting help to identify adaptors count in reads

Fields, Christopher J cjfields at illinois.edu
Thu Nov 10 21:13:12 UTC 2011


If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split?  Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)).  

tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match.  '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/).

chris

On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote:

> 
> There are many ways to do it. 
> Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
> For example: 
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
> You then place that result in a hash bin:
> my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
> # Then you can sort and output your classes
> foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }
> 
> You can workout the details, but something like this should work.
> 
> 
> 
> 
> 
> 
> 
>> Date: Thu, 10 Nov 2011 04:29:55 -0800
>> From: casaburi at ceinge.unina.it
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
>> 
>> 
>> Hi everybody,
>> 
>> i have some reads (454) where there are adaptors (NNNN...), one,two or three
>> adaptors for each reads depending on the reads. Is there any way to
>> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
>> over the total ???
>> 
>>> 271-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>>> 272-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>>> 273-88
>> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>>> 274-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
>> 
>> The problem is that some adpators occur in the middle of the sequences
>> because they coming out from a concameration experimental design (they are
>> miRNAs between NNNNNN...). So i want to know a script or tool that may say
>> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
>> number of reads. Do you know any tool/script that may help ? Tnx 
>> Can anyone suggests me a script to fix this ???
>> 
>> Thank you very much 
>> -- 
>> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 		 	   		  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list