[Bioperl-l] Fwd: help parsing msf file or clustalW file reports
Jason Stajich
jason at bioperl.org
Fri Sep 4 16:04:05 UTC 2009
Paola - it is important to continue to email the mailing list for your
help. I'm hoping another person on the list can help as I am swamped
right now.
-jason
Begin forwarded message:
> From: Paola Bisignano <paola_bisignano at yahoo.it>
> Date: September 4, 2009 5:48:22 AM PDT
> To: Jason Stajich <jason at bioperl.org>
> Subject: Re: [Bioperl-l] help parsing msf file or clustalW file
> reports
>
> Hi Jason, thank for your answer there are two day that I'm re-
> studyng synopsys of bioperl and programming object...I understand
> what you mean...but I have some problems...I don't actually know how
> to start to parse this kind of file, I generated this msf file or
> clustalW file, by parsing a fasta file of multiple paired
> sequences..so I parsed in msf file...extracting only the paired
> sequences I want..so homolog proteins that have same ligand
> published in pdb bank..
>
>
> I have a problem with the parsing of msf file...I can't find the exact
>
>
> object of Bio::SimpleAlign for my case...
>
>
> I have to identify residues (from a list) in aligned sequences...but
>
>
> when I parse the alignment from fasta file, I save as msf file, where
>
>
> I have to identify my residue (from the list, numbering as the pdb
>
>
> file) and the residue aligned in the aligned sequences...
>
>
>
>
>
> this is a piece of the file...
>
>
>
>
>
> NoName MSF: 2 Type: P Wed Aug 26 10:32:50 2009 Check: 00 ..
>
>
>
>
>
> Name: Sequence/23-178 Len: 156 Check: 8937 Weight: 1.00
>
>
> Name: 2zhz:A/1-148 Len: 156 Check: 9006 Weight: 1.00
>
>
>
>
>
> //
>
>
>
>
>
>
>
>
>
> 1 50
>
>
> Sequence/23-178 NDPRVAAYGE VDELNSWVGY TKSLINSHTQ VLSNELEEIQ
> QLLFDCGHDL
>
>
> 2zhz:A/1-148 DDARIAAIGD VDELNSQIGV L--LAEPLPD DVRAALSAIQ
> HDLFDLGGEL
>
>
>
>
>
>
>
>
>
> 51 100
>
>
> Sequence/23-178 ATPADDERHS FKFKQEQPTV WLEEKIDNYT QVVPAVKKHI
> LPGGTQLASA
>
>
> 2zhz:A/1-148 CIPGHAAITD AHLARLDG-- WLA----HYN GQLPPLEEFI
> LPGGARGAAL
>
>
>
>
>
>
>
>
>
> 101 150
>
>
> Sequence/23-178 LHVARTITRR AERQIVQLMR EEQINQDVLI FINRLSDYFF
> AAARYANYLE
>
>
> 2zhz:A/1-148 AHVCRTVCRR AERSIVALGA SEPLNAAPRR YVNRLSDLLF
> VLARVLNRAA
>
>
>
>
>
>
>
>
>
> 151 200
>
>
> Sequence/23-178 QQPDML
>
>
> 2zhz:A/1-148 GGADVL
>
>
>
>
>
> for example in this I have to identify the residue that is in front of
>
>
> Val 28 (that is in Sequen) in 2zhz:A (that manually conting is Ile
>
>
> 5)....
>
>
> Tyr4-> has no residue in front of it because the alignment starts from
>
>
> N23 of Sequence...
>
>
> how can I find the way to enter the residue of my sequen, and extract
>
>
> the residue from the other????
>
>
>
>
>
>
>
>
> I wish you all dear friends..and I'm actually in atrouble with this..
>
>
> Thanks for suggestions
>
>
>
>
>
>
> --- Mar 1/9/09, Jason Stajich <jason at bioperl.org> ha scritto:
>
> Da: Jason Stajich <jason at bioperl.org>
> Oggetto: Re: [Bioperl-l] help parsing msf file or clustalW file
> reports
> A: "Paola Bisignano" <paola_bisignano at yahoo.it>
> Cc: bioperl-l at lists.open-bio.org
> Data: Martedì 1 settembre 2009, 17:49
>
> I think you might want to use the column_from_residue_number method
> that is part of Bio::SimpleAlign - it lets you get the column from
> an alignment based on the sequence residue, doing some math along
> the way to deal with gaps. That is the residue -> alignment
> direction. If you are starting at the alignment and want to get the
> residue's position you will use the location_from_column on a
> particular sequence so
>
> # select somehow a sequence from the alignment, e.g.
> my $seq = $aln->get_seq_by_pos(1);
> #$loc is undef or Bio::LocationI object
> my $loc = $seq->location_from_column(5);
>
> -jason
>
> On Sep 1, 2009, at 5:20 AM, Paola Bisignano wrote:
>
>> Hi,
>>
>> I'm trying to parse fasta files, where I have couple of
>> alignments....I need to identify my residue in my alignment......I
>> have separate lists that derived from ligplot parsing files.. so I
>> have to manipulate string...but I don't now how to start..it seems
>> complicated..
>> I used Bio::AlignIO to parse the fasta file, so I can have a parsed
>> file in msf or clustalW forma
>>
>> here an example:
>> CLUSTAL W(1.81) multiple sequence alignment
>>
>>
>> Sequence/9-273
>> DKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKE
>> 2pl0:A/6-268 DEWEVPRETLKLVERLGAGQFGEVWMGYYNGHT-
>> KVAVKSLKQGSMSPDAFLAEANLMKQ
>> *:**: * :.: .:**.**:***:
>> * :: :: .****:**:.:*. : ** ** :**:
>>
>>
>> Sequence/9-273
>> IKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVLLYMATQISSAME
>> 2pl0:A/6-268 LQHQRLVRLYAVVTQEP-
>> IYIITEYMENGSLVDFLKTPSGIKLTINKLLDMAAQIAEGMA
>> ::* .**:* .* *:** :*****:*
>> *.*:*:*: . ::: ** **:**:..*
>>
>> I choose two residue for example...how can I extract
>> them...starting from their position in the pdb file?
>> I need to walk...to my sequence
>>
>> I don't know if it is clear because I cannot explain the question
>> correctly in english...are there any Italians?
>> could anyone help me?
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
More information about the Bioperl-l
mailing list