[Bioperl-l] how-to-remove-redundant-lines

Heikki Lehvaslaiho heikki at ebi.ac.uk
Wed Jun 29 05:20:40 EDT 2005


... and if your set are not in decreasing order, you can not print them out 
immediately, bit you have to store them in a hash and test if the new set is 
a superset or a subset of each existing set, and remove and add sets in the 
hash accordingly - and print the hash elements in the end.

 -Heikki

On Wednesday 29 June 2005 10:00, Heikki Lehvaslaiho wrote:
> Vijayraj,
>
> Your probelm in mathematical terms is comparing sets.
>
> In pseudocode:
>
> parse first line, create a set, write the line
> add set to an array
> for each subsequent line {
>     parse the line, create a set
>     for each old set in the array {
>         if this set is a subset of the old set {
>            next line
>         }
>     }
>     # if we are here, we have not seen the set before
>     add set to an array, write the line
> }
>
> the output will contain the unique lines only.
>
> There are a lot of modules in CPAN that can do the algebra for you. One of
> them is Set::Scalar: http://search.cpan.org/~jhi/Set-Scalar-1.19/
>
>
> Yours,
>  -Heikki
>
> On Wednesday 29 June 2005 08:19, vijayaraj nagarajan wrote:
> > hi
> > i have a cluster file with contents like this:
> >
> > 1 2 5 7 8 11
> > 2 5 7 8 11
> > 3 13 17 19
> > 4 21 45 67
> > 5 7 8 11
> >
> > Now the 1,2 and 5th lines are redundant. i need to
> > remove the 2nd and 5th line from the file, while
> > retaining only the first line, since the first line
> > contains all the members present in 2 and 5th line...
> >
> > could anyone suggest me how to parse this file, to
> > remove such redundant lines using perl.
> > any help and suggestions in this regard would be
> > greatly appreciated.
> >
> > thanks
> >
> > vijayaraj nagarajan
> > research assistant
> > the university of southern mississippi
> > ms, usa
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list