[Bioperl-l] parse columns file

Jordi Durban jordi.durban at gmail.com
Mon Feb 14 07:27:58 EST 2011


Ok. That's true. I suspected that was a "hash" question but I don't know
much about hashes and I expectd to find some bioperl scripts to do that.
Thank you very much.

2011/2/14 Frank Schwach <fs5 at sanger.ac.uk>

> Hi Jordi,
>
> you will want to use a *hash* to store your IDs if you want to make them
> non-redundant. This is not BioPerl related - it's more of a basic Perl
> question. My recommendation would be to get the excellent "Beginning
> Perl for Bioinformatics" book from O'Reilly and also check out the
> general-Perl sites and forums, such as perlmonks.com. You can also just
> google for Perl hashes and see if it makes sense to you and try to work
> it into the example script you found already. Good luck!!
>
> Frank
>
>
> On Mon, 2011-02-14 at 12:34 +0100, Jordi Durban wrote:
> > Thank you very much John. The output should be the two fields from each
> > entry. In the example above, it should be:
> > *uaccno=FF56QEU12HD1LC *   *gi|166216293|sp|P0C616.1|PA2HA_BOTAS
> > **uaccno=FF56QEU12HMBY2*    *gi|166216293|sp|P0C616.1|PA2HA_BOTAS *
> > *uaccno=FF56QEU12HDB9V *  * gi|166215047|sp|P24605.3|PA2H2_BOTAS
> >
> > *According to http://perl.about.com/od/filesystem/a/perl_parse_tabs.htmI
> > have to do:
> >
> > open (FILE, 'data.txt');
> >  while (<FILE>) {
> >  chomp;
> >  ($name, $email, $phone) = split("\t");
> >  print "Name: $name\n";
> >  print "Email: $email\n";
> >  print "Phone: $phone\n";
> >
> >  print "---------\n";
> >  }
> >
> >  close (FILE);
> >
> > But this script doesn't deal with the duplicated lines...
> >
> > 2011/2/14 John SJ Anderson <genehack at genehack.org>
> >
> > > On Mon, Feb 14, 2011 at 05:41, Jordi Durban <jordi.durban at gmail.com>
> > > wrote:
> > > > Hi all!
> > > > I'm trying to parse a three columns file. The first one could be
> > > repeated.
> > > > However, I would like to obtain the results for the first one.
> > > [ snip ]
> > >
> > > You've given us the input. What should the output look like?
> > >
> > > j.
> > >
> >
> >
> >
>
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>



-- 
Jordi



More information about the Bioperl-l mailing list