[Bioperl-l] parse columns file

Mon Feb 14 12:10:46 UTC 2011

Hi Jordi,

you will want to use a *hash* to store your IDs if you want to make them
non-redundant. This is not BioPerl related - it's more of a basic Perl
question. My recommendation would be to get the excellent "Beginning
Perl for Bioinformatics" book from O'Reilly and also check out the
general-Perl sites and forums, such as perlmonks.com. You can also just
google for Perl hashes and see if it makes sense to you and try to work
it into the example script you found already. Good luck!!

Frank

On Mon, 2011-02-14 at 12:34 +0100, Jordi Durban wrote:
> Thank you very much John. The output should be the two fields from each
> entry. In the example above, it should be:
> *uaccno=FF56QEU12HD1LC *   *gi|166216293|sp|P0C616.1|PA2HA_BOTAS
> **uaccno=FF56QEU12HMBY2*    *gi|166216293|sp|P0C616.1|PA2HA_BOTAS *
> *uaccno=FF56QEU12HDB9V *  * gi|166215047|sp|P24605.3|PA2H2_BOTAS
> 
> *According to http://perl.about.com/od/filesystem/a/perl_parse_tabs.htm I
> have to do:
> 
> open (FILE, 'data.txt');
>  while (<FILE>) {
>  chomp;
>  ($name, $email, $phone) = split("\t");
>  print "Name: $name\n";
>  print "Email: $email\n";
>  print "Phone: $phone\n";
> 
>  print "---------\n";
>  }
> 
>  close (FILE);
> 
> But this script doesn't deal with the duplicated lines...
> 
> 2011/2/14 John SJ Anderson <genehack at genehack.org>
> 
> > On Mon, Feb 14, 2011 at 05:41, Jordi Durban <jordi.durban at gmail.com>
> > wrote:
> > > Hi all!
> > > I'm trying to parse a three columns file. The first one could be
> > repeated.
> > > However, I would like to obtain the results for the first one.
> > [ snip ]
> >
> > You've given us the input. What should the output look like?
> >
> > j.
> >
> 
> 
> 

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.