[Bioperl-l] Perl script for sequence matching

Suraj Peri suraj_peri@yahoo.com
Fri, 14 Jul 2000 23:22:04 -0700 (PDT)


--0-596516649-963642124=:6351
Content-Type: text/plain; charset=us-ascii


Hi all,

Thank you for the solutions. I have a  bottle neck finding the same hits.

I mean I have two files

1. which contains hits from the BLAST [blast output]

2. Contains hists  which contain sequences accession numbers, zscore etc...

 

DROSOPHILA:CG3727-FBAN0003727 + 7.41 128.40 548 ! last_updated:000321

 

having these two files inhand..i would like to pick the same hits [ seq. which are reported both in file 1 and file 2..

I tried this using gawk but i got all the hists in the files and i am sure there are unique hits reported only in either file1 ot in file 2.

It would be  a great help if any one can suggest a script.. 

reading both files and reporting the hits which are found in both 1 and 2 files...

Thank you...

Peri

Biotechnology centre

m.s.univ. of baroda

baroda india.


 
  

  Paul Gordon <gordonp@niji.imb.nrc.ca> wrote: 
Another one-liner, or at least close to it :-)

perl -ne 'BEGIN{$/=">";$"=";"}($d,$_)=/(.*?)\n(.+?)>?$/s;push
@{$h{lc()}},$d if $_;END{for(keys%h){print">@{$h{$_}}$_"}}'
filename1 filename2 ...

________________________________________________________________________
Paul Gordon Paul.Gordon@nrc.ca
Genomic Technologies http://maggie.cbr.nrc.ca
Institute for Marine Biosciences
National Research Council Canada

On Fri, 14 Jul 2000, Suraj Peri wrote:

> 
> Hi all,
> I am interested in knowing a script t o report the
> unique sequences from two files. It should not report
> the repetitions.
> I am infact trying to write this but my script is not
> working.
> As i need it fast so can anyone please help . is that
> in BIOPERL.
> 
> thanks.
> Peri.
> 
> __________________________________________________
> Do You Yahoo!?
> Get Yahoo! Mail &ndash; Free email you can access from anywhere!
> http://mail.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l


---------------------------------
Do You Yahoo!?
Get Yahoo! Mail - Free email you can access from anywhere!
--0-596516649-963642124=:6351
Content-Type: text/html; charset=us-ascii


<P>Hi all,</P>
<P>Thank you for the solutions. I have a&nbsp; bottle neck finding the same hits.</P>
<P>I mean I have two files</P>
<P>1. which contains hits from the BLAST [blast output]</P>
<P>2. Contains hists&nbsp; which contain sequences accession numbers, zscore etc...</P>
<P>&nbsp;</P><FONT face="Courier New" size=2>
<P>DROSOPHILA:CG3727-FBAN0003727 + 7.41 128.40 548 ! last_updated:000321</P>
<P>&nbsp;</P>
<P>having these two files inhand..i would like to pick the same hits [ seq. which are reported both in file 1 and file 2..</P>
<P>I tried this using gawk but i got all the hists in the files and i am sure there are unique hits reported only in either file1 ot in file 2.</P>
<P>It would be&nbsp; a great help if any one can suggest a script.. </P>
<P>reading both files and reporting the hits which are found in both 1 and 2 files...</P>
<P>Thank you...</P>
<P>Peri</P>
<P>Biotechnology centre</P>
<P>m.s.univ. of baroda</P>
<P>baroda india.</P></FONT>
<P><BR>&nbsp;
<P>&nbsp; <BR>
<P>&nbsp; <B><I>Paul Gordon &lt;gordonp@niji.imb.nrc.ca&gt;</I></B> wrote: <BR>
<BLOCKQUOTE style="BORDER-LEFT: #1010ff solid 2px; MARGIN-LEFT: 5px; PADDING-LEFT: 5px">Another one-liner, or at least close to it :-)<BR><BR>perl -ne 'BEGIN{$/=&quot;&gt;&quot;;$&quot;=&quot;;&quot;}($d,$_)=/(.*?)\n(.+?)&gt;?$/s;push<BR>@{$h{lc()}},$d if $_;END{for(keys%h){print&quot;&gt;@{$h{$_}}$_&quot;}}'<BR>filename1 filename2 ...<BR><BR>________________________________________________________________________<BR>Paul Gordon Paul.Gordon@nrc.ca<BR>Genomic Technologies http://maggie.cbr.nrc.ca<BR>Institute for Marine Biosciences<BR>National Research Council Canada<BR><BR>On Fri, 14 Jul 2000, Suraj Peri wrote:<BR><BR>&gt; <BR>&gt; Hi all,<BR>&gt; I am interested in knowing a script t o report the<BR>&gt; unique sequences from two files. It should not report<BR>&gt; the repetitions.<BR>&gt; I am infact trying to write this but my script is not<BR>&gt; working.<BR>&gt; As i need it fast so can anyone please help . is that<BR>&gt; in BIOPERL.<BR>&gt; <BR>&gt; thanks.<BR>&gt; Peri.<BR>&gt; <BR>&gt; __________________________________________________<BR>&gt; Do You Yahoo!?<BR>&gt; Get Yahoo! Mail &ndash; Free email you can access from anywhere!<BR>&gt; http://mail.yahoo.com/<BR>&gt; _______________________________________________<BR>&gt; Bioperl-l mailing list<BR>&gt; Bioperl-l@bioperl.org<BR>&gt; http://bioperl.org/mailman/listinfo/bioperl-l<BR>&gt; <BR><BR>_______________________________________________<BR>Bioperl-l mailing list<BR>Bioperl-l@bioperl.org<BR>http://bioperl.org/mailman/listinfo/bioperl-l</BLOCKQUOTE><p><br><hr size=1><b>Do You Yahoo!?</b><br>
Get <a href="http://mail.yahoo.com/">Yahoo! Mail</a> - Free email you can access from anywhere!
--0-596516649-963642124=:6351--