[Bioperl-l] modify sequence names

Adam Sjøgren asjo at koldfront.dk
Sat May 19 15:13:03 UTC 2012


On Sat, 19 May 2012 10:34:04 -0400, yang wrote:

> Would anyone please help me to modify sequence names with bioperl? I am
> editing them manually now, is there a easier way?

You don't need BioPerl specifically to do simple text manipulation.

>> lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome
>> coxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]

[... to ...]

>> cox1

Maybe you can use something like:

  $ sed 's/^>.*\[gene=\([^]]*\)\].*$/\1/g'
  >lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome coxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]
  ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
  GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
  TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
  cox1
  ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
  GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
  TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT
  $ 

If you need to use Perl rather than sed, you can use:

  $ perl -pe 's/^>.*\[gene=([^]]+).*$/>$1/'

instead.

The easiest way is probably to learn a little programming and/or regular
expressions.

Learning Perl by Randal L. Schwartz, brian d foy, and Tom Phoenix could
be a starting point, so could many online tutorials.


  Best regards,

    Adam

-- 
 "Hur långt man än har kommit                                 Adam Sjøgren
  är det alltid längre kvar"                             asjo at koldfront.dk




More information about the Bioperl-l mailing list