[Biojava-l] Biojava-l Digest, Vol 104, Issue 6
Khalil El Mazouari
khalil.elmazouari at gmail.com
Mon Sep 19 16:35:07 UTC 2011
Hi
take a look at http://en.wikipedia.org/wiki/Levenshtein_distance
Regards,
khalil
On 19 Sep 2011, at 18:00, biojava-l-request at lists.open-bio.org wrote:
> Send Biojava-l mailing list submissions to
> biojava-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> or, via email, send a message with subject or body 'help' to
> biojava-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
> biojava-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biojava-l digest..."
>
>
> Today's Topics:
>
> 1. Re: [Biojava-dev] A question about multiple alignment
> (Andreas Prlic)
> 2. UniprotParser (Saif Ur-Rehman)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 18 Sep 2011 16:50:27 -0700
> From: Andreas Prlic <andreas at sdsc.edu>
> Subject: Re: [Biojava-l] [Biojava-dev] A question about multiple
> alignment
> To: Shahab Kamali <skamali at cs.uwaterloo.ca>
> Cc: biojava-l at biojava.org
> Message-ID:
> <CALthepxeBhoVSpzC3Yvu1_+15OurcEyeZsYAuX8qm1MNh-dXzQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi Shahab,
>
> Sounds like you want to use an identity matrix for the alignment..
>
> Andreas
>
> On Sat, Sep 17, 2011 at 3:28 PM, Shahab Kamali <skamali at cs.uwaterloo.ca> wrote:
>> Thanks Andreas,
>> I want two components that have different names to have 0 alignment score.
>> My application is not about bio-compounds,so I can use anything else rather
>> than ProteinSequence and AminoAcidCompound. I just need to align sequences
>> of arbitrary alphabets. Could you suggest me a solution please?
>> Thanks a lot,
>> Shahab
>>
>> Quoting Andreas Prlic <andreas at sdsc.edu>:
>>
>>> Hi Shahab,
>>>
>>> did you take a look at the substitution matrix, if it is scoring your
>>> sequences according to your expectation? Looks like in your
>>> theoretical example the alignment of B and D is favorable, i.e. it has
>>> a positive alignment score..
>>>
>>> Andreas
>>>
>>>
>>> On Fri, Sep 16, 2011 at 10:56 AM, Shahab Kamali <skamali at cs.uwaterloo.ca>
>>> wrote:
>>>>
>>>> Hi,
>>>> I am using BioJava in a pattern mining project. I want to align a set of
>>>> relatively short sequences. For example to align {"ABCE", "ABCE", "ADE",
>>>> "ADE").
>>>>
>>>> This is a part of my code:
>>>>
>>>> SubstitutionMatrix<AminoAcidCompound> matrix = new
>>>> ? ? ? ? ? ? ? ? ? ?SimpleSubstitutionMatrix<AminoAcidCompound>();
>>>> GuideTree<ProteinSequence, AminoAcidCompound> gt = new
>>>> GuideTree<ProteinSequence,
>>>> AminoAcidCompound>(lst,Alignments.getAllPairsScorers(lst,
>>>> ? ? ? ? ? ? ? ? ? Alignments.PairwiseSequenceScorerType.GLOBAL, ?new
>>>> ? ? ? ? ? ? ? ? ? SimpleGapPenalty((short)0,(short)0), matrix));
>>>> ? ? ? ? ? ?Profile<ProteinSequence, AminoAcidCompound> profile =
>>>>
>>>> Alignments.getProgressiveAlignment(gt,Alignments.ProfileProfileAlignerType.GLOBAL,
>>>> new SimpleGapPenalty((short)0,(short)0),matrix);
>>>>
>>>> The result of the above code is:
>>>> ABCE
>>>> ABCE
>>>> AD-E
>>>> AD-E
>>>>
>>>> But what I need is
>>>> A-BCE
>>>> A-BCE
>>>> AD--E
>>>> AD--E
>>>> or
>>>> ABC-E
>>>> ABC-E
>>>> A--DE
>>>> A--DE
>>>>
>>>> Do you have any suggestion?
>>>> Thanks,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>
>>
>>
>>
>>
>>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 19 Sep 2011 11:09:46 +0100
> From: Saif Ur-Rehman <su24 at st-andrews.ac.uk>
> Subject: [Biojava-l] UniprotParser
> To: biojava-l at biojava.org
> Message-ID:
> <CABpZy=wUXJM42NVjmSetwX463hT+B5RLjwc2KP0R00rDiTYD-Q at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Dear all,
>
> I am having issues with the BioJava UniProt parser as detailed below:
>
> Code:
>
> BufferedReader br = new BufferedReader(new FileReader( files[index]));
> Namespace ns = RichObjectFactory.getDefaultNamespace();
> RichSequenceIterator iterator = RichSequence.IOTools.readUniProt(br, ns);
> while(iterator.hasNext())
> {
> try
> {
> RichSequence rs=iterator.nextRichSequence();
> }
>
> catch (NoSuchElementException e)
> {
>
> }
> catch (BioException e)
> {
> e.printStackTrace();
> }
>
>
>
>
> The file I am using is downloaded from the link:
>
> ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_fungi.dat.gz
>
>
> The problem is that the parser works for a subset of the IDs within the file
> and on others throws an exception.
>
> Sample Exception stack trace:
>
> *** Start of trace *************************
>
> at
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
> at uniprot.mp.main(mp.java:161)
> Caused by: org.biojava.bio.seq.io.ParseException:
>
> A Exception Has Occurred During Parsing.
> Please submit the details that follow to biojava-l at biojava.org or post a bug
> report to http://bugzilla.open-bio.org/
>
> Format_object=org.biojavax.bio.seq.io.UniProtFormat
> Accession=P53031
> Id=
> Comments=
> Parse_block=RN [1]RP NUCLEOTIDE SEQUENCE [GENOMIC DNA].RC STRAIN=NCYC
> 2512;RX MEDLINE=97082501; PubMed=8923737;
> DOI=10.1002/(SICI)1097-0061(199610)12:13<1321::AID-YEA27>3.0.CO;2-6;RA
> Rodriguez P.L., Ali R., Serrano R.;RT "CtCdc55p and CtHa13p: two putative
> regulatory proteins from Candida
> tropicalis with long acidic domains.";RL Yeast 12:1321-1329(1996).
> Stack trace follows ....
>
>
> at
> org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:615)
> at
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
> ... 1 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:486)
> ... 2 more
> org.biojava.bio.BioException: Could not read sequence
> at
> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
> at uniprot.mp.main(mp.java:161)
> Caused by: org.biojava.bio.seq.io.ParseException: Name has not been supplied
>
> ********End of trace**********************************
>
> An example of an Id that worked is:
>
> ZYM1_SCHPO
>
> while an ID that didn't work is:
>
> ZUO1_YEAST
>
> Thanks a lot in advance.
>
> Cheers,
> Saif
>
>
> --
> Saif Ur-Rehman
>
> Centre for Evolution, Genes and Genomics
> Harold Mitchell Building
> University of St Andrews
> St Andrews
> Fife
> KY16 9TH
> UK
>
> Tel: +44 131 5572556
> Fax: +44 1334 463366
>
>
> ------------------------------
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
> End of Biojava-l Digest, Vol 104, Issue 6
> *****************************************
More information about the Biojava-l
mailing list