[Biojava-dev] [Biojava-l] No. of gaps in aligned sequences
Muhammad Tariq Pervez
tariq_cp at hotmail.com
Fri Jul 8 05:09:51 UTC 2011
The code is as follows: Actually the code is taken from BioJavaCookbook
with a little modification. The following method is called from another
class. The method takes the names of the files or simply say files as an
argument in the form of list.
public void MSAFromFiles(List<String> ids) throws Exception{
List<ProteinSequence> lst = new ArrayList<ProteinSequence>();
ProteinSequence pSeq=null;
for (String id : ids) {
pSeq=getSequenceFromFiles(id);
lst.add(pSeq);
//System.out.println("seq==" +pSeq);
}
profile = Alignments.getMultipleSequenceAlignment(lst);
}
getSequenceFromFiles() method is given below
private ProteinSequence getSequenceFromFiles(String inputFile) throws Exception{
ProteinSequence seq=null;
//System.out.println("inputFile==="+inputFile);
FileInputStream is = new FileInputStream(inputFile);
FastaReader<ProteinSequence, AminoAcidCompound>
fastaReader = new FastaReader<ProteinSequence,
AminoAcidCompound>(is, new
GenericFastaHeaderParser<ProteinSequence,AminoAcidCompound>(), new
ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));
LinkedHashMap<String,ProteinSequence> proteinSequences = fastaReader.process();
is.close();
//System.out.println( "proteinSequences=" + proteinSequences );
//LinkedHashMap<String, ProteinSequence> a = FastaReaderHelper.readFastaProteinSequence(new File(fileName));
for ( Entry<String, ProteinSequence> entry : proteinSequences.entrySet() ) {
seq= new ProteinSequence(entry.getValue().getSequenceAsString());
seq.setAccession(entry.getValue().getAccession());
//System.out.println( "Inside getSequenceFromFile=" + seq );
//FastaReaderHelper.readFastaDNASequence for DNA sequences
}
return seq;
}
After getting the Profile object I wrote the following code to display the No. of gaps
List<AlignedSequence<ProteinSequence,AminoAcidCompound>> listOfalSeq=profile.getAlignedSequences();
AlignedSequence<ProteinSequence,AminoAcidCompound> alSeq;
int noOfcompounds=0;
int numOfGaps=0;
StringBuilder html= new
StringBuilder("<html><body><table
border=1><tr><td>Accession Id</td><td>Number
of gaps</td></tr>");
for (int i=0; i<listOfalSeq.size(); i++){
alSeq=listOfalSeq.get(i);
accessionId=alSeq.getAccession().getID();
noOfcompounds=alSeq.countCompounds();
numOfGaps=alSeq.getNumGaps();
html.append("<tr><td>");
html.append(accessionId);
html.append("</td><td>");
html.append(numOfGaps);
html.append("</td></tr>");
//System.out.println("accessionId==" +accessionId);
//pSeq=new ProteinSequence(seq.getSequenceAsString(),seq.getCompoundSet());
//pSeq.setAccession(seq.getAccession());
//multipleSequenceAlignment.addAlignedSequence(pSeq);
}
html.append("</table></body></html>");
setText(html.toString());
setText() method is the method of JEditorPane or JTextPane
Tariq, Phd Scholar
Muhammad Tariq Pervez
Assistant Professor,
Department of Computer Science
Virtual University of Pakistan, Lahore
Tel: (042) 9203114-7
URL: www.vu.edu.pk
Mobile: +923364120541, +923214602694
> Date: Thu, 7 Jul 2011 08:10:53 -0700
> Subject: Re: [Biojava-l] No. of gaps in aligned sequences
> From: andreas at sdsc.edu
> To: tariq_cp at hotmail.com
> CC: biojava-l at biojava.org; biojava-dev at biojava.org
>
> Hi Tariq,
>
> Can you send us the sample code / DB accession IDs so we can try to
> reproduce this?
>
> Andreas
>
> On Wed, Jul 6, 2011 at 4:37 AM, Muhammad Tariq Pervez
> <tariq_cp at hotmail.com> wrote:
> >
> >
> > Hi, Dear all,
> > I am working on the development of MSA application using BioJava. I want to make clear a thing. It is that when two or more protein sequences are aligned the '-' is shown more times in an aligned sequence than the gaps display by the method of alSeq.getNumGaps(). 'alSeq' is an aligned sequence. For example, if there are actual 50 '-' in an aligned sequence but the method shows it only 30. What is the difference between these two results.
> >
> > Best Regards
> >
> >
> > Tariq, Phd Scholar
> >
> > _______________________________________________
> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
More information about the biojava-dev
mailing list