[Bioperl-l] removing redundant accession numbers

Chris Fields cjfields at uiuc.edu
Thu Sep 7 23:04:34 UTC 2006


Not necessarily a Bioperl question, but...

Use a hash and autovivification, not an array:

my %ids;

while (<>) {
    chomp;
    $ids{$_}++;
}

print "$_\t$ids{$_} times\n" for sort keys %ids;

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of kamesh narasimhan
> Sent: Thursday, September 07, 2006 4:52 PM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] removing redundant accession numbers
> 
> Hi ppl,
> 
> I am newbie to perl/bioperl programming.
> 
> I currently have a task, (which looks a bit daunting to me now...). I
> have a text file, in which I have a set of accession numbers and which
> look like this
> 
> acession_numbers.txt contain: (a '>'' followed by two lower case
> alphabets followed by ten digits).
> 
> >ci0100130090
> >ci0100130320
> >ci0100130340
> >ci0100130574
> >ci0100130090
> >ci0100130804
> >ci0100130945
> >ci0100130986
> >ci0100130090
> >ci0100131137
> >ci0100131140
> >ci0100130320
> >ci0100130340
> >ci0100130804
> >ci0100130945
> 
> Some of the accession numbers may be repeated in the file, like for
> example >ci0100130090 is repeated 3 times, >ci0100130340 is repeated 3
> times etc; >ci0100130320 2 times etc;
> 
> I would want the output file for a program telling me, that
> 
> output file.txt
> 
> >ci0100130090 - 3 times
> >ci0100130320 - 2 times
> .......
> 
> I tried perl scripting with the idea of getting to read the $/ = '>'
> and getting each element in an array....however, ya..i am not able to
> proceed....and seem to going nowhere....
> 
> any help with scripting (and if possible with comments) in this regard
> will be greatly appreciated.
> 
> Thanks a zillion in advance
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list