[Bioperl-l] Fwd: help parsing msf file or clustalW file reports

Fri Sep 4 16:04:05 UTC 2009

Paola - it is important to continue to email the mailing list for your  
help.  I'm hoping another person on the list can help as I am swamped  
right now.
-jason

Begin forwarded message:

> From: Paola Bisignano <paola_bisignano at yahoo.it>
> Date: September 4, 2009 5:48:22 AM PDT
> To: Jason Stajich <jason at bioperl.org>
> Subject: Re: [Bioperl-l] help parsing msf file or clustalW file  
> reports
>
> Hi Jason, thank for your answer there are two day that I'm re- 
> studyng synopsys of bioperl and programming object...I understand  
> what you mean...but I have some problems...I don't actually know how  
> to start to parse this kind of file, I generated this msf file or  
> clustalW file, by parsing a fasta file of multiple paired  
> sequences..so I parsed in msf file...extracting only the paired  
> sequences I want..so homolog proteins that have same ligand  
> published in pdb bank..
>
>
> I have a problem with the parsing of msf file...I can't find the exact
>
>
> object of Bio::SimpleAlign for my case...
>
>
> I have to identify residues (from a list) in aligned sequences...but
>
>
> when I parse the alignment from fasta file, I save as msf file, where
>
>
> I have to identify my residue (from the list, numbering as the pdb
>
>
> file) and the residue aligned in the aligned sequences...
>
>
>
>
>
> this is a piece of the file...
>
>
>
>
>
> NoName   MSF: 2  Type: P  Wed Aug 26 10:32:50 2009  Check: 00 ..
>
>
>
>
>
>  Name: Sequence/23-178  Len:    156  Check:  8937  Weight:  1.00
>
>
>  Name: 2zhz:A/1-148     Len:    156  Check:  9006  Weight:  1.00
>
>
>
>
>
> //
>
>
>
>
>
>
>
>
>                       
> 1                                                   50
>
>
> Sequence/23-178       NDPRVAAYGE VDELNSWVGY TKSLINSHTQ VLSNELEEIQ  
> QLLFDCGHDL
>
>
> 2zhz:A/1-148          DDARIAAIGD VDELNSQIGV L--LAEPLPD DVRAALSAIQ  
> HDLFDLGGEL
>
>
>
>
>
>
>
>
>                       
> 51                                                 100
>
>
> Sequence/23-178       ATPADDERHS FKFKQEQPTV WLEEKIDNYT QVVPAVKKHI  
> LPGGTQLASA
>
>
> 2zhz:A/1-148          CIPGHAAITD AHLARLDG-- WLA----HYN GQLPPLEEFI  
> LPGGARGAAL
>
>
>
>
>
>
>
>
>                       
> 101                                                150
>
>
> Sequence/23-178       LHVARTITRR AERQIVQLMR EEQINQDVLI FINRLSDYFF  
> AAARYANYLE
>
>
> 2zhz:A/1-148          AHVCRTVCRR AERSIVALGA SEPLNAAPRR YVNRLSDLLF  
> VLARVLNRAA
>
>
>
>
>
>
>
>
>                       
> 151                                                200
>
>
> Sequence/23-178       QQPDML
>
>
> 2zhz:A/1-148          GGADVL
>
>
>
>
>
> for example in this I have to identify the residue that is in front of
>
>
> Val 28 (that is in Sequen) in 2zhz:A (that manually conting is Ile
>
>
> 5)....
>
>
> Tyr4-> has no residue in front of it because the alignment starts from
>
>
> N23 of Sequence...
>
>
> how can I find the way to enter the residue of my sequen, and extract
>
>
> the residue from the other????
>
>
>
>
>
>
>
>
> I wish you all dear friends..and I'm actually in atrouble with this..
>
>
> Thanks for suggestions
>
>
>
>
>
>
> --- Mar 1/9/09, Jason Stajich <jason at bioperl.org> ha scritto:
>
> Da: Jason Stajich <jason at bioperl.org>
> Oggetto: Re: [Bioperl-l] help parsing msf file or clustalW file  
> reports
> A: "Paola Bisignano" <paola_bisignano at yahoo.it>
> Cc: bioperl-l at lists.open-bio.org
> Data: Martedì 1 settembre 2009, 17:49
>
> I think you might want to use the column_from_residue_number method  
> that is part of Bio::SimpleAlign - it lets you get the column from  
> an alignment based on the sequence residue, doing some math along  
> the way to deal with gaps. That is the residue -> alignment  
> direction.  If you are starting at the alignment and want to get the  
> residue's position you will use the location_from_column on a  
> particular sequence so
>
>     # select somehow a sequence from the alignment, e.g.
>     my $seq = $aln->get_seq_by_pos(1);
>     #$loc is undef or Bio::LocationI object
>     my $loc = $seq->location_from_column(5);
>
> -jason
>
> On Sep 1, 2009, at 5:20 AM, Paola Bisignano wrote:
>
>> Hi,
>>
>> I'm trying to parse fasta files, where I have couple of  
>> alignments....I need to identify my residue in my alignment......I  
>> have separate lists that derived from ligplot parsing files.. so I  
>> have to manipulate string...but I don't now how to start..it seems  
>> complicated..
>> I used Bio::AlignIO to parse the fasta file, so I can have a parsed  
>> file in msf or clustalW forma
>>
>> here an example:
>> CLUSTAL W(1.81) multiple sequence alignment
>>
>>
>> Sequence/9-273          
>> DKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKE
>> 2pl0:A/6-268           DEWEVPRETLKLVERLGAGQFGEVWMGYYNGHT- 
>> KVAVKSLKQGSMSPDAFLAEANLMKQ
>>                         *:**: *  :.: .:**.**:***:  
>> * :: :: .****:**:.:*. : ** ** :**:
>>
>>
>> Sequence/9-273          
>> IKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVLLYMATQISSAME
>> 2pl0:A/6-268           LQHQRLVRLYAVVTQEP- 
>> IYIITEYMENGSLVDFLKTPSGIKLTINKLLDMAAQIAEGMA
>>                         ::* .**:* .* *:** :*****:*   
>> *.*:*:*:  .  :::   ** **:**:..*
>>
>> I  choose two residue for example...how can I extract  
>> them...starting from their position in the pdb file?
>> I need to walk...to my sequence
>>
>> I don't know if it is clear because I cannot explain the question  
>> correctly in english...are there any Italians?
>> could anyone help me?
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org