[Bioperl-l] Bioperl-l Digest, Vol 114, Issue 11 regex in message 2

Tom Keller kellert at ohsu.edu
Sat Oct 20 17:16:15 UTC 2012


The regex is clever and shows the power of regular expressions and perl. Basically, the capturing parens include a negation so it says "after gene=" save any characters except ']' until the next ']', which is exactly what you said you wanted.

But I think there is a typo: the -s should be -e

thanks for the nice help Jason.
Tom
OHSU, Portland  OR

On Oct 20, 2012, at 9:00 AM, <bioperl-l-request at lists.open-bio.org<mailto:bioperl-l-request at lists.open-bio.org>> wrote:

Message: 2
Date: Fri, 19 Oct 2012 23:43:29 -0600
From: Jason Stajich <jason.stajich at gmail.com<mailto:jason.stajich at gmail.com>>
Subject: Re: [Bioperl-l] how to rename genbank header in fasta file?
To: yang liu <yang.liu0508 at gmail.com<mailto:yang.liu0508 at gmail.com>>
Cc: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
Message-ID: <5611663A-0073-4D26-9DDF-D01BAFDCDC5D at gmail.com<mailto:D01BAFDCDC5D at gmail.com>>
Content-Type: text/plain; charset=us-ascii

are you parsing exactly this file - it is in FASTA format not genbank.

You don't need bioperl for this:
perl -i -p -s 's/>.+\[gene=([^\]]+)\].+/>$1/' file.fa

I'd read up on regular expressions and perl to learn more about how to do string replacement to learn how to do this better.





More information about the Bioperl-l mailing list