[Biojava-l] how to calculate consensus from a fasta file
Eric BELLARD
eric_bellard at yahoo.com
Wed Jan 14 04:03:05 EST 2004
Thanks for your response.
My problem is easier than you though.
I simpy have to calculate the ambiguity symbol for
each column.
My solution is:
- create a list whith a set of symbol for each column
- fill the set with each symbol of each sequence
- calculate the ambiguity symbols for each set of this
list
It works pretty well but if the sequences become too
long I imagine I'll use too much memory.
I'll try to find another solution using the alignment
object in the framework. At the moment I don't know
enough the framework to find solution of this kind
with it. I'll try...
Anyway thanks for your help.
Eric
--- mark.schreiber at group.novartis.com wrote:
> Hi Eric -
>
> I'm not sure if this will solve your problem but you
> could make an
> Alignment object from the sequences and then use the
> methods of
> DistributionTools to get a Distribution object for
> each position in the
> Alignment. These distributions will tell you the
> frequency of each base at
> each position in the Alignment which you could use
> to make a consensus.
> You can also use DistributionTools to calculate
> information or entropy at
> each position.
>
> Alternatively you could generate a markov model that
> represents the
> alignment and probabilistically represents the
> consensus.
>
> Hope this helps
>
> Mark
>
>
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 1 Science Park Road
> #04-14 The Capricorn
> Singapore 117528
>
> phone +65 6722 2973
> fax +65 6722 2910
>
>
>
>
>
> Eric BELLARD <eric_bellard at yahoo.com>
> Sent by: biojava-l-bounces at portal.open-bio.org
> 01/13/2004 09:35 PM
> Please respond to eric
>
>
> To: biojava-l at biojava.org
> cc:
> Subject: [Biojava-l] how to calculate
> consensus from a fasta file
>
>
> Hi,
>
> I'd like to first thank you all for your great job
> on
> this project.
>
> I'm using biojava in a project to store some
> sequencing result.
>
> In my application the user upload sequences from a
> fasta file, and I like to build an alignment from
> it.
>
> With your project, I can easily parse the fasta file
> and get all the sequences.
>
> Let's consider the sequences as lines.
> I'd like to calculate the column consensus using dna
> degenerate alphabet.
>
> Does biojava implements a way to do this?
>
> Thanks by advance.
>
> Eric
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Hotjobs: Enter the "Signing Bonus"
> Sweepstakes
> http://hotjobs.sweepstakes.yahoo.com/signingbonus
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>
__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus
More information about the Biojava-l
mailing list