[BioRuby] removing primers and corresponding quality data from sequences
George Githinji
georgkam at gmail.com
Fri Feb 12 08:57:54 UTC 2010
Hi
I would like to remove both the primer and the portion before the 5'
end and one after the 3' end
def primers
['G*CACG[A|C]AGTTT[C|T]GC','GC[G|A]AAACT[T|G]CGTGC','G*CCCATTC[G|C]TCGAACCA','TGGTTCGA[C|G]GAATGGGC']
#primers.collect! { |primer| create_regexp(primer) }
end
def bioentries(reads_file)
Bio::FlatFile.auto(reads_file){ |f| f.map {|entry| entry} }
end
def remove_primers(file_name)
reg1 = Regexp.new(primers[0])
bioentries(file_name).map do |entry|
# puts ">#{entry.definition}"
#puts entry.seq
puts entry.seq.gsub(reg1,'')
end
end
would remove the primers but not the portion before the 5' end
Secondly, it does not give me the corresponding co-ordinates so that i
can remove the associated quality data for the removed file
third the approach seems 'dirty'
On Fri, Feb 12, 2010 at 11:56 AM, George Githinji <georgkam at gmail.com> wrote:
> Hi would like to remove both the primer and the portion before the 5'
> end and one after the 3' end
>
> def primers
> ['G*CACG[A|C]AGTTT[C|T]GC','GC[G|A]AAACT[T|G]CGTGC','G*CCCATTC[G|C]TCGAACCA','TGGTTCGA[C|G]GAATGGGC']
> #primers.collect! { |primer| create_regexp(primer) }
> end
>
> def bioentries(reads_file)
> Bio::FlatFile.auto(reads_file){ |f| f.map {|entry| entry} }
> end
>
> def remove_primers(file_name)
> reg1 = Regexp.new(primers[0])
> bioentries(file_name).map do |entry|
> # puts ">#{entry.definition}"
> #puts entry.seq
>
> puts entry.seq.gsub(reg1,'')
>
> end
> end
>
> would remove the primers but not the portion before the 5' end
>
> Secondly, it does not give me the corresponding co-ordinates so that i
> can remove the associated quality data for the removed file
>
> third the approach seems 'dirty'
>
>
>
> On Fri, Feb 12, 2010 at 11:46 AM, Andrew Grimm <andrew.j.grimm at gmail.com> wrote:
>> I can't really help, but is it primers that you want removed, or the
>> portion of sequence that's before the 5' primer or after the 3'
>> primer?
>>
>> Andrew
>>
>> On Fri, Feb 12, 2010 at 7:35 PM, George Githinji <georgkam at gmail.com> wrote:
>>> Hi All,
>>> I have a list of sequences and corresponding quality files for the
>>> same data. I would like to remove the primers as well as the
>>> corresponding quality information.
>>> The approach that i am using is proving to be dirty and buggy,
>>>
>>> For example given:
>>> 1.A list of sequences in fasta file format
>>> 2.A list of 4 possible primer patterns. (no idea which sequence might
>>> contain which primer)
>>> 3.A list of quality data in phred format for each sequence,
>>>
>>> The task is to remove the possible primers from the sequences and
>>> anything before or after the primer.
>>> Each sequence has at least 2 combination of primes. one on the 5' and
>>> the other on the 3' end.
>>>
>>> Return a list of sequences with primer ends removed and the
>>> corresponding quality data for the primers removed.
>>>
>>> What would be a nice way to approach this problem.
>>>
>>>
>>>
>>>
>>> --
>>> ---------------
>>> Sincerely
>>> George
>>> PhD Student
>>> KEMRI/Wellcome-Trust Research Program
>>> Skype: george_g2
>>> Blog: http://biorelated.wordpress.com/
>>> _______________________________________________
>>> BioRuby Project - http://www.bioruby.org/
>>> BioRuby mailing list
>>> BioRuby at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>
>>
>
>
>
> --
> ---------------
> Sincerely
> George
> PhD Student
> KEMRI/Wellcome-Trust Research Program
> Skype: george_g2
> Blog: http://biorelated.wordpress.com/
>
--
---------------
Sincerely
George
PhD Student
KEMRI/Wellcome-Trust Research Program
Skype: george_g2
Blog: http://biorelated.wordpress.com/
More information about the BioRuby
mailing list