[Bioperl-l] how-to-remove-redundant-lines

Heikki Lehvaslaiho heikki at ebi.ac.uk
Wed Jun 29 05:00:38 EDT 2005


Vijayraj,

Your probelm in mathematical terms is comparing sets. 

In pseudocode:

parse first line, create a set, write the line
add set to an array
for each subsequent line {
    parse the line, create a set
    for each old set in the array {
        if this set is a subset of the old set {
           next line
        }
    }
    # if we are here, we have not seen the set before
    add set to an array, write the line
}

the output will contain the unique lines only.

There are a lot of modules in CPAN that can do the algebra for you. One of 
them is Set::Scalar: http://search.cpan.org/~jhi/Set-Scalar-1.19/


Yours,
 -Heikki

On Wednesday 29 June 2005 08:19, vijayaraj nagarajan wrote:
> hi
> i have a cluster file with contents like this:
>
> 1 2 5 7 8 11
> 2 5 7 8 11
> 3 13 17 19
> 4 21 45 67
> 5 7 8 11
>
> Now the 1,2 and 5th lines are redundant. i need to
> remove the 2nd and 5th line from the file, while
> retaining only the first line, since the first line
> contains all the members present in 2 and 5th line...
>
> could anyone suggest me how to parse this file, to
> remove such redundant lines using perl.
> any help and suggestions in this regard would be
> greatly appreciated.
>
> thanks
>
> vijayaraj nagarajan
> research assistant
> the university of southern mississippi
> ms, usa
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list