[Bioperl-l] Stockholm to fasta
Chris Fields
cjfields at illinois.edu
Tue Sep 22 17:13:53 UTC 2009
The POD for Bio::AlignIO::stockholm indicates where the various bits
of information are stored. Everything from the header should be in
there in the latest bioperl; in many cases it's not ideally stored,
but it's accessible.
You'll need to preprocess your seqs in the SimpleAlign returned
(iterate through them and change the relevant bits like desc(),
displayname(), seq_id, etc) and may need to do other modifications,
but it should work.
chris
On Sep 22, 2009, at 11:48 AM, shalabh sharma wrote:
> Hi All, I am trying to convert stockholm to fasta format. I am
> using
> "sreformat" for this purpose. I am getting a fasta file but the
> problem is i
> want header information from stockholm in my fasta file.
> Like:
> # STOCKHOLM 1.0
>
> #=GF AC RF00003
> #=GF ID U1
> #=GF DE U1 spliceosomal RNA
> - - - - - - - - - - - - - -
> - - - - - - - - - - - -- -
> - - - - - - -- - - - - -
> #=GF RL J Biol Chem 2001;276:21476-21481.
> #=GF CC U1 is a small nuclear RNA (snRNA) component of the
> spliceosome
> #=GF CC (involved in pre-mRNA splicing). Its 5' end forms
> complementary
> #=GF CC base pairs with the 5' splice junction, thus defining the 5'
> #=GF CC donor site of an intron.
> #=GF CC There are significant differences in sequence and secondary
> #=GF CC structure between metazoan and yeast U1 snRNAs, the latter
> being
> #=GF CC much longer (568 nucleotides as compared to 164
> nucleotides in
> #=GF CC human). Nevertheless, secondary structure predictions
> suggest
> #=GF CC that all U1 snRNAs share a 'common core' consisting of
> helices I,
> #=GF CC II, the proximal region of III, and IV [1].
> #=GF CC This family does not contain the larger yeast sequences.
> #=GF SQ 100
>
>
> X63783.1/2024-2186
> UUACUUACCUGGCUGG.AGUUU.GCUA...UCGAUCAU.GAAG.GGUAG.
> X63783.1/1394-1556
> UUACUUACCUGGCUGG.AGUUA.GCUA...UCGAUCAU.GAAG.GGUAG.
> X58845.1/1-161
> ..ACUUACCUGGCUGG.AGUUU.GCUA...UCGAUCAU.GAAG.GGUAG.
> X63783.1/596-756
> UAAAUUACAAUGUUGU.AGUUA.GCUA...UAUAUCAA.AAAA.UAUAG.
> M29062.1/238-387
> UUACUUACCUGGCAUG.AGUUU..CUG...CAGCACAA.GAAU.UGUGG.
>
> As a output i am just getting a fasta file with the headers like
> "X63783.1/2024-2186" but what i want is that it should include some
> information like U1 or U1 spliceosomal RNA from the stockholm headers.
>
> I would really appreciate if anyone can help me out.
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list