[Bioperl-l] Re: removing duplicate fasta records

Lincoln Stein lstein@cshl.org
Wed, 18 Dec 2002 09:05:18 -0500


Ummm, what's wrong with it?  It's a model of lucidity compared to awk.

My version was designed to show how to do this with bioperl, which I thought 
was what the original inquiry was asking.  You can do this with sort, uniq 
and sed too.

Lincoln

On Wednesday 18 December 2002 07:07 am, Michal Kurowski wrote:
> Ewan Birney [birney@ebi.ac.uk] wrote:
> > > perl -ne 'BEGIN{$/=">";$"=";"}/(.*?)\n(.+?)\s*>?$/s && push
> > > @{$h{$2}},$1;END{for(keys%h){print ">@{$h{$_}}\n$_\n"}}'
> > >
> > > Which will remove redundant entries AND concatenate their description
> > > lines :-)
> >
> > The reason I ... despair .... of Perl is the ability to write things like
> > this (no offence Paul - I am sure you wouldn't write something like that
> > in a script using a production setting would you ;) ).
>
> Actually that one-liner seems to be usefull, in some cases.
>
> > My Love-Hate relationship with Perl just deepens.
>
> We will have to love Seals for some time for just that reason.
>
> Check it out:
> http://www.ncbi.nlm.nih.gov/CBBresearch/Walker/SEALS/index.html.
>
> Cheers,

-- 
Lincoln Stein
lstein@cshl.org