[Bioperl-l] modify sequence names

yang liu yang.liu0508 at gmail.com
Sat May 19 14:34:04 UTC 2012


Dear colleagues,

Would anyone please help me to modify sequence names with bioperl? I am
editing them manually now, is there a easier way?
I have a bunch of sequences in the format:

>lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome c
oxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]
ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT

>lcl|NC_017840.1_cdsid_YP_006280920.1 [gene=ccmFn] [protein=cytochrome c
biogenesis FN] [protein_id=YP_006280920.1] [location=2225..3940]
ATGTCAATAAATGCATTTTCTCATTATTCGTTCTTTCCGGGTCTTTTCGTTGCATTCACTTACAACAAGA
AAGAACCACCAGCGTTTGGTGCAGCCCCTGCATTTTGGTGCATTCTTCTTTCTTTCCTTGGTCTTTCGTT
CCGTCATATTCCTAATAACTTATCCAATTACAGCGTATTAACCGCTAATGCACCTTTCTTTTATCAAATC

I hope to keep only the gene name, which means the word behind "gene=",
like:
>cox1
ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT

>ccmFn
ATGTCAATAAATGCATTTTCTCATTATTCGTTCTTTCCGGGTCTTTTCGTTGCATTCACTTACAACAAGA
AAGAACCACCAGCGTTTGGTGCAGCCCCTGCATTTTGGTGCATTCTTCTTTCTTTCCTTGGTCTTTCGTT
CCGTCATATTCCTAATAACTTATCCAATTACAGCGTATTAACCGCTAATGCACCTTTCTTTTATCAAATC

Any help would be appreciated. Thanks,

Yang.



More information about the Bioperl-l mailing list