[Bioperl-l] extract sequences and save into files by genes

Cook, Malcolm MEC at stowers.org
Tue Feb 28 16:55:13 UTC 2012


Yang,

I'm replying back on-list.

You wrote in your other email that my one-liner worked once you learned to run perl from the command line under cygwin.

Great.  Glad to help.  Good luck.  Welcome to the fray!

~Malcolm

From: yang liu [mailto:yang.liu0508 at gmail.com]
Sent: Monday, February 27, 2012 10:04 PM
To: Cook, Malcolm
Subject: Re: [Bioperl-l] extract sequences and save into files by genes

Hello Malcolm,

Thanks for your help. But when I run it, it returned the following line.
'\.txt' is not recognized as an internal or external command, operable program or batch file.

I am using windows 7, is that the problem? I have perl installed.
In windows command, I firstly changed to the folder the target files exist, and then paste your script line.
I am a beginner of perl.

Thanks again for your help.

Yang.


On Mon, Feb 27, 2012 at 10:47 AM, Cook, Malcolm <MEC at stowers.org<mailto:MEC at stowers.org>> wrote:
You don't need bioperl for this one.....

The following perl one liner will do it for you.

perl -p -e 'if (1==$.) {($species = $ARGV) =~ s|\.txt||}; if (s/^>(.*)/">${species}"/e) {$gene=$1; open($O{$gene},qq{>> ${gene}.txt}); select($O{$gene})} ; close ARGV  if eof' *.txt


~Malcolm

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf Of yang liu
> Sent: Saturday, February 25, 2012 12:52 AM
> To: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
> Subject: [Bioperl-l] extract sequences and save into files by genes
>
> Dear colleagues,
>
> I have multiple files named by species name. Each file has ca. 100
> different genes. I want to extract the sequences and save them by gene.
> In the output file, the gene name would be the species name. How should I
> do?
>
> The input file would be like this (with the file name, Acidosasa.txt,
> Acorus.txt....)
>
> >rps12
> ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT
> AGAAAATCGCCCGCGC
> TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC
> CAAAAAACCCAACTC
> >psbA
> TTATCCATTAAGAGATGGAACTTCAAGAACAGCTAGGTCTAGAGGGAAGTTGTG
> AGCATTACGTTCGTGC
> ATTACCTCCATACCAAGATTAGCACGGTTGATGATATCAGCCCAAGTATTAATAAC
> GCGACCTTGGCTAT
> .....
>
> I hope the output file to be like this, file name = rps12.txt, psbA.txt....
>
> within rps12.txt, the sequence is like,
>
> >Acidosasa
>
> ATGCCAACGGTTAAACAACTTATTAGAAACGCAAGACAGCCAATACGAAATGCT
> AGAAAATCGCCCGCGC
> TTAAGGGATGTCCTCAGCGTCGAGGAACATGTGCTAGGGTGTATACTATCAACCC
> CAAAAAACCCAACTC
>
>
>
>
>
> >Acorus
> ATGCCAACTATTAAACAACTTATTAGAAACACAAGACAGCCAATCCGAAATGTC
>
> I do not know if I expressed clearly.
>
> Thanks.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list