[Bioperl-l] Affys ReseqChip

Marian Thieme marian.thieme at lycos.de
Fri Apr 13 10:12:51 UTC 2007


Hi,

To provide a better understanding of the matter and to assess the approach I will shortly present 
1.) the problem and 2.) my approach.


1.)
given: fragments (string of certain length) with description of location within some reference sequence. For instance:

- redundant fragment: acgtnna--gcta (deletion: pos12, pos13)
- start position: 5
- end position: 17
- and some suited reference sequence

Fragments are assumed to be mappable 1:1 to reference sequence and can contain gaps and n's, the latter indicates that the base wasnt determined maybe because of failed hybridization or something like this.
Thus we dont need to cope with insertions/deletions in terms of only parsing an array design file (description of all insertions and deletions in each redundant fragment) and according to that description inserting gaps in the reference sequence and in the fragments if required.
So from my point of view and in the case of the affy mitochip v2 we only need to process the description file rather than calculating an alignment via dynamic programming matrix.


2.)
My current approach is like the following 5 steps:

1.) input reference sequence and redundant fragments into SeqIO object.

2.) calculate a hash with all insertions defined by length and position and
3.) insert the longest insertion of each position in the appropriate fragments and in the reference sequence. And hence insert as many gaps as given by

length(max_insertion(position_x))-length(insertion(fragment_y, position _x))

to each fragment/reference sequence.
(This is done by iterating over each sequence in the SeqIO and insert gaps according to insertion hash) and

4.) Create SimpleAlign object with LocatableSeq objects

5.) Afterwards we can do some statistical analysis and calc some consensus base for each column in the SimpleAlignment. (I use a Statistics module from cpan).

Unfortunatly I didnt manage to find some method that is giving me the set of bases (column) for a given position in the alignment (did I overlooked something ? is SimpleAlign not appropriate? ), so I iterate for each position (base) of the reference sequence and for each fragments which covers that particular position.


Marian




Jonathan Epstein schrieb:

> This sounds great to me.
>
> Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.
>
> Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.
>
> Jonathan

Stelle Deine Fragen bei Lycos iQ -  http://iq.lycos.de/qa/ask/


More information about the Bioperl-l mailing list