[Bioperl-l] modify sequence names

Smithies, Russell Russell.Smithies at agresearch.co.nz
Sun May 20 20:57:24 UTC 2012


Or a Perl inline replace - saves on temp files.

perl -npi -e  's/^>.*\[gene=([^]]+).*$/>$1/' 


--Russell

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Adam Sjøgren
Sent: Sunday, 20 May 2012 3:13 a.m.
To: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] modify sequence names

On Sat, 19 May 2012 10:34:04 -0400, yang wrote:

> Would anyone please help me to modify sequence names with bioperl? I 
> am editing them manually now, is there a easier way?

You don't need BioPerl specifically to do simple text manipulation.

>> lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome
>> coxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]

[... to ...]

>> cox1

Maybe you can use something like:

  $ sed 's/^>.*\[gene=\([^]]*\)\].*$/\1/g'
  >lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome coxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]
  ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
  GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
  TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
  cox1
  ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
  GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
  TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
  $ 

If you need to use Perl rather than sed, you can use:

  $ perl -pe 's/^>.*\[gene=([^]]+).*$/>$1/'

instead.

The easiest way is probably to learn a little programming and/or regular expressions.

Learning Perl by Randal L. Schwartz, brian d foy, and Tom Phoenix could be a starting point, so could many online tutorials.


  Best regards,

    Adam

-- 
 "Hur långt man än har kommit                                 Adam Sjøgren
  är det alltid längre kvar"                             asjo at koldfront.dk

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list