[Bioperl-l] removing redundant accession numbers

Hilmar Lapp hlapp at gmx.net
Fri Sep 8 12:34:09 UTC 2006


Actually, this is trivial in Unix:

	$ sort my.list.of.accessions | uniq -c

-hilmar

On Sep 7, 2006, at 7:04 PM, Chris Fields wrote:

> Not necessarily a Bioperl question, but...
>
> Use a hash and autovivification, not an array:
>
> my %ids;
>
> while (<>) {
>     chomp;
>     $ids{$_}++;
> }
>
> print "$_\t$ids{$_} times\n" for sort keys %ids;
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of kamesh narasimhan
>> Sent: Thursday, September 07, 2006 4:52 PM
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] removing redundant accession numbers
>>
>> Hi ppl,
>>
>> I am newbie to perl/bioperl programming.
>>
>> I currently have a task, (which looks a bit daunting to me now...). I
>> have a text file, in which I have a set of accession numbers and  
>> which
>> look like this
>>
>> acession_numbers.txt contain: (a '>'' followed by two lower case
>> alphabets followed by ten digits).
>>
>>> ci0100130090
>>> ci0100130320
>>> ci0100130340
>>> ci0100130574
>>> ci0100130090
>>> ci0100130804
>>> ci0100130945
>>> ci0100130986
>>> ci0100130090
>>> ci0100131137
>>> ci0100131140
>>> ci0100130320
>>> ci0100130340
>>> ci0100130804
>>> ci0100130945
>>
>> Some of the accession numbers may be repeated in the file, like for
>> example >ci0100130090 is repeated 3 times, >ci0100130340 is  
>> repeated 3
>> times etc; >ci0100130320 2 times etc;
>>
>> I would want the output file for a program telling me, that
>>
>> output file.txt
>>
>>> ci0100130090 - 3 times
>>> ci0100130320 - 2 times
>> .......
>>
>> I tried perl scripting with the idea of getting to read the $/ = '>'
>> and getting each element in an array....however, ya..i am not able to
>> proceed....and seem to going nowhere....
>>
>> any help with scripting (and if possible with comments) in this  
>> regard
>> will be greatly appreciated.
>>
>> Thanks a zillion in advance
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list