From Michael.Rusch at STJUDE.ORG Wed Dec 1 13:37:00 2010 From: Michael.Rusch at STJUDE.ORG (Rusch, Michael) Date: Wed, 1 Dec 2010 12:37:00 -0600 Subject: [Biojava-l] newbie question about gff Message-ID: Newbie question: I want to read in a GFF file and process it taking full advantage of the gene model (e.g. exons belong to a transcript). I see that I can read the GFF into an EntrySet, and then annotate a Sequence with the EntrySet, but I don't have a sequence to start off with. I could read in sequence from a FASTA file, but since I'm not actually using sequence, and I'm dealing with human chromosomes, it seems quite wasteful to do so. I could also just use the EntrySet, but then I can't take advantage of the gene model. Is there a way to get a Sequence object with no sequence, just features from a GFF file? Thanks, Michael ________________________________ Email Disclaimer: www.stjude.org/emaildisclaimer From darnells at dnastar.com Fri Dec 3 18:35:21 2010 From: darnells at dnastar.com (Steve Darnell) Date: Fri, 3 Dec 2010 17:35:21 -0600 Subject: [Biojava-l] PDBFileParser question using PDBID 470D In-Reply-To: References: Message-ID: Andreas, I've been using biojava to gather sequence data from structure files for an internal project. My intent was to test the limitations of my work (hence files similar to 470D), but came across this behavior in biojava. It is not critical to obtain this particular mapping since it can be derived from the atom records. However, I didn't understand why the SEQRES list would be empty and was looking for clarification. Is it because the chain is RNA and the empty list prevents the unsupported alignment of RNA records? Regards, Steve -----Original Message----- From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Monday, November 29, 2010 6:36 PM To: Steve Darnell Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] PDBFileParser question using PDBID 470D Hi Steve, as you already are saying, this is an "exotic" sequence, in the sense that this is an RNA. The alignments of the SEQRES records for RNA currently is not supported as of yet. Can you explain a bit more what you are doing and why you need this mapping in this case? Thanks, Andreas On Mon, Nov 29, 2010 at 12:51 PM, Steve Darnell wrote: > Greetings, > > After parsing PDBID 470D with biojava-3.0-alpha5, Chain A returns an > empty SEQRES sequence (Chain.getSeqResSequence) and empty SEQRES group > list (Chain.getSeqResGroups) but the one-letter ATOM sequence is > properly translated and the ATOM group list contains the appropriate > number of groups (LoadChemCompInfo set to true). > > This is an exotic sequence, but my expectation is that the SEQRES group > list would have members in it (and one-letter sequence translated if > LoadChemCompInfo is true). ?Am I mistaken and the current behavior is > the intended result? > > Best regards, > Steve Darnell > > -- > SEQRES records exist in 470D: > > SEQRES ? 1 A ? 12 ?C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 > > SEQRES ? 1 B ? 12 ?C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 > > > > Sample println output (ln 1 record type, ln 2 get${TYPE}Sequence, ln 3 > get${TYPE}Groups): > > SEQRES > '' > [] > > ATOM > 'CGCGAAUUCGCG' > [PDB: C43 1 trueatoms: 21, PDB: G48 2 trueatoms: 27, PDB: C43 3 > trueatoms: 24, ...] > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Sat Dec 4 12:46:08 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 4 Dec 2010 09:46:08 -0800 Subject: [Biojava-l] PDBFileParser question using PDBID 470D In-Reply-To: References: Message-ID: Hi Steve, > I've been using biojava to gather sequence data from structure files for an internal project. ?My intent was to test the limitations of my work (hence files similar to 470D), but came across this behavior in biojava. ok > It is not critical to obtain this particular mapping since it can be derived from the atom records. ?However, I didn't understand why the SEQRES list would be empty and was looking for clarification. ?Is it because the chain is RNA and the empty list prevents the unsupported alignment of RNA records? When working with PDB files the list gets built up after the alignment. Since RNA alignment is not supported by the parser, the list can't get created... In principle mmCif files contain the info how to join SEQRES and ATOM groups correctly and no alignment is needed. I will take a look again, how this works in this case... Andreas > -----Original Message----- > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic > Sent: Monday, November 29, 2010 6:36 PM > To: Steve Darnell > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] PDBFileParser question using PDBID 470D > > Hi Steve, > > as you already are saying, this is an "exotic" sequence, in the sense > that this is an RNA. The alignments of the SEQRES records for RNA > currently is not supported as of yet. Can you explain a bit more what > you are doing and why you need this mapping in this case? > > Thanks, > Andreas > > On Mon, Nov 29, 2010 at 12:51 PM, Steve Darnell wrote: >> Greetings, >> >> After parsing PDBID 470D with biojava-3.0-alpha5, Chain A returns an >> empty SEQRES sequence (Chain.getSeqResSequence) and empty SEQRES group >> list (Chain.getSeqResGroups) but the one-letter ATOM sequence is >> properly translated and the ATOM group list contains the appropriate >> number of groups (LoadChemCompInfo set to true). >> >> This is an exotic sequence, but my expectation is that the SEQRES group >> list would have members in it (and one-letter sequence translated if >> LoadChemCompInfo is true). ?Am I mistaken and the current behavior is >> the intended result? >> >> Best regards, >> Steve Darnell >> >> -- >> SEQRES records exist in 470D: >> >> SEQRES ? 1 A ? 12 ?C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 >> >> SEQRES ? 1 B ? 12 ?C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 >> >> >> >> Sample println output (ln 1 record type, ln 2 get${TYPE}Sequence, ln 3 >> get${TYPE}Groups): >> >> SEQRES >> '' >> [] >> >> ATOM >> 'CGCGAAUUCGCG' >> [PDB: C43 1 trueatoms: 21, PDB: G48 2 trueatoms: 27, PDB: C43 3 >> trueatoms: 24, ...] >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From cfriedline at vcu.edu Mon Dec 6 07:45:38 2010 From: cfriedline at vcu.edu (Chris Friedline) Date: Mon, 6 Dec 2010 07:45:38 -0500 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment Message-ID: Hello, Found another potential error case, this time in beta2 (fresh pull from git last evening). ?For more info, please see http://pastie.org/1351388 for test case and stack trace. ?The JUnit test passes simply because the pair object is not null, but fails when trying to extract any information from the pair itself (toString(), getIdenticals(), etc). The substitution matrix file is from ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of pairwise alignments, which do not all fail, but most do with this same error. Thanks, Chris -- PhD Candidate, Integrative Life Sciences Virginia Commonwealth University Richmond, VA From ayates at ebi.ac.uk Mon Dec 6 08:50:20 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 6 Dec 2010 13:50:20 +0000 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: Message-ID: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> Hi Chris, Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful Andy On 6 Dec 2010, at 12:45, Chris Friedline wrote: > Hello, > > Found another potential error case, this time in beta2 (fresh pull > from git last evening). For more info, please see > http://pastie.org/1351388 for test case and stack trace. The JUnit > test passes simply because the pair object is not null, but fails when > trying to extract any information from the pair itself (toString(), > getIdenticals(), etc). The substitution matrix file is from > ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of > pairwise alignments, which do not all fail, but most do with this same > error. > > Thanks, > Chris > > -- > PhD Candidate, Integrative Life Sciences > Virginia Commonwealth University > Richmond, VA > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From ayates at ebi.ac.uk Mon Dec 6 10:13:49 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 6 Dec 2010 15:13:49 +0000 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> Message-ID: <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. Does anyone on list know how the AlignedSequence code encodes gaps & the alike? Andy On 6 Dec 2010, at 13:50, Andy Yates wrote: > Hi Chris, > > Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful > > Andy > > On 6 Dec 2010, at 12:45, Chris Friedline wrote: > >> Hello, >> >> Found another potential error case, this time in beta2 (fresh pull >> from git last evening). For more info, please see >> http://pastie.org/1351388 for test case and stack trace. The JUnit >> test passes simply because the pair object is not null, but fails when >> trying to extract any information from the pair itself (toString(), >> getIdenticals(), etc). The substitution matrix file is from >> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of >> pairwise alignments, which do not all fail, but most do with this same >> error. >> >> Thanks, >> Chris >> >> -- >> PhD Candidate, Integrative Life Sciences >> Virginia Commonwealth University >> Richmond, VA >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas at sdsc.edu Mon Dec 6 12:22:55 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 6 Dec 2010 09:22:55 -0800 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> Message-ID: Hi Andy, Check out the SimpleAlignedSequence class, for how Gaps are handled... Does that help? Andreas On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: > So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. > > Does anyone on list know how the AlignedSequence code encodes gaps & the alike? > > Andy > > On 6 Dec 2010, at 13:50, Andy Yates wrote: > >> Hi Chris, >> >> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful >> >> Andy >> >> On 6 Dec 2010, at 12:45, Chris Friedline wrote: >> >>> Hello, >>> >>> Found another potential error case, this time in beta2 (fresh pull >>> from git last evening). ?For more info, please see >>> http://pastie.org/1351388 for test case and stack trace. ?The JUnit >>> test passes simply because the pair object is not null, but fails when >>> trying to extract any information from the pair itself (toString(), >>> getIdenticals(), etc). The substitution matrix file is from >>> ftp://ftp.ncbi.nih.gov/blast/matrices. ?I'm doing large numbers of >>> pairwise alignments, which do not all fail, but most do with this same >>> error. >>> >>> Thanks, >>> Chris >>> >>> -- >>> PhD Candidate, Integrative Life Sciences >>> Virginia Commonwealth University >>> Richmond, VA >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From cfriedline at vcu.edu Mon Dec 6 13:28:37 2010 From: cfriedline at vcu.edu (Chris Friedline) Date: Mon, 6 Dec 2010 13:28:37 -0500 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> Message-ID: That does help, thanks. However, when calling getAsList() on the aligned sequences and printing, this is what I see. Something seems wrong. It does appear as though null is being inserted where there should be gaps seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, null, null, null, null, null, null] seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] Chris On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic wrote: > Hi Andy, > > Check out the SimpleAlignedSequence class, for how Gaps are handled... > Does that help? > > Andreas > > On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: >> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. >> >> Does anyone on list know how the AlignedSequence code encodes gaps & the alike? >> >> Andy >> >> On 6 Dec 2010, at 13:50, Andy Yates wrote: >> >>> Hi Chris, >>> >>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful >>> >>> Andy >>> >>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: >>> >>>> Hello, >>>> >>>> Found another potential error case, this time in beta2 (fresh pull >>>> from git last evening). ?For more info, please see >>>> http://pastie.org/1351388 for test case and stack trace. ?The JUnit >>>> test passes simply because the pair object is not null, but fails when >>>> trying to extract any information from the pair itself (toString(), >>>> getIdenticals(), etc). The substitution matrix file is from >>>> ftp://ftp.ncbi.nih.gov/blast/matrices. ?I'm doing large numbers of >>>> pairwise alignments, which do not all fail, but most do with this same >>>> error. >>>> >>>> Thanks, >>>> Chris >>>> >>>> -- >>>> PhD Candidate, Integrative Life Sciences >>>> Virginia Commonwealth University >>>> Richmond, VA >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- PhD Candidate, Integrative Life Sciences Virginia Commonwealth University Richmond, VA From cfriedline at vcu.edu Mon Dec 6 13:41:17 2010 From: cfriedline at vcu.edu (Chris Friedline) Date: Mon, 6 Dec 2010 13:41:17 -0500 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> Message-ID: OK, so here's a quick fix now that I know where to look. In my local source I added the following line to the constructor of DNACompoundSet and recompiled. addNucleotideCompound("-", "-"); Not sure if this is the correct place for it in terms of what the devs want to do globally, but it gets me moving forward again. Gap characters are in AminoAcidCompoundSet so I'm wondering if this was just a tiny oversight on the nucleotide front. Thanks again for the help everyone, Chris On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline wrote: > That does help, thanks. ?However, when calling getAsList() on the > aligned sequences and printing, this is what I see. ?Something seems > wrong. ?It does appear as though null is being inserted where there > should be gaps > > seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, > G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, > T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, > null, null, null, null, null, null] > seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, > G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, > null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, > null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] > > Chris > > On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic wrote: >> Hi Andy, >> >> Check out the SimpleAlignedSequence class, for how Gaps are handled... >> Does that help? >> >> Andreas >> >> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: >>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. >>> >>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike? >>> >>> Andy >>> >>> On 6 Dec 2010, at 13:50, Andy Yates wrote: >>> >>>> Hi Chris, >>>> >>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful >>>> >>>> Andy >>>> >>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: >>>> >>>>> Hello, >>>>> >>>>> Found another potential error case, this time in beta2 (fresh pull >>>>> from git last evening). ?For more info, please see >>>>> http://pastie.org/1351388 for test case and stack trace. ?The JUnit >>>>> test passes simply because the pair object is not null, but fails when >>>>> trying to extract any information from the pair itself (toString(), >>>>> getIdenticals(), etc). The substitution matrix file is from >>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. ?I'm doing large numbers of >>>>> pairwise alignments, which do not all fail, but most do with this same >>>>> error. >>>>> >>>>> Thanks, >>>>> Chris >>>>> >>>>> -- >>>>> PhD Candidate, Integrative Life Sciences >>>>> Virginia Commonwealth University >>>>> Richmond, VA >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >>> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > PhD Candidate, Integrative Life Sciences > Virginia Commonwealth University > Richmond, VA > -- PhD Candidate, Integrative Life Sciences Virginia Commonwealth University Richmond, VA From ayates at ebi.ac.uk Mon Dec 6 14:32:55 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 6 Dec 2010 19:32:55 +0000 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> Message-ID: <91FB5471-89AD-463F-AB1F-9D93F5EA46C7@ebi.ac.uk> I would say partially an oversight on my part & partially done on purpose (a gap is not a nucleotide after all). However I'm all in favour of being pragmatic here so lets add them in. If I get an okay from the relevant parties I'll commit the change in. Andy On 6 Dec 2010, at 18:41, Chris Friedline wrote: > OK, so here's a quick fix now that I know where to look. In my local > source I added the following line to the constructor of DNACompoundSet > and recompiled. > > addNucleotideCompound("-", "-"); > > Not sure if this is the correct place for it in terms of what the devs > want to do globally, but it gets me moving forward again. Gap > characters are in AminoAcidCompoundSet so I'm wondering if this was > just a tiny oversight on the nucleotide front. > > Thanks again for the help everyone, > Chris > > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline wrote: >> That does help, thanks. However, when calling getAsList() on the >> aligned sequences and printing, this is what I see. Something seems >> wrong. It does appear as though null is being inserted where there >> should be gaps >> >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, >> null, null, null, null, null, null] >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] >> >> Chris >> >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic wrote: >>> Hi Andy, >>> >>> Check out the SimpleAlignedSequence class, for how Gaps are handled... >>> Does that help? >>> >>> Andreas >>> >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: >>>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. >>>> >>>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike? >>>> >>>> Andy >>>> >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote: >>>> >>>>> Hi Chris, >>>>> >>>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful >>>>> >>>>> Andy >>>>> >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> Found another potential error case, this time in beta2 (fresh pull >>>>>> from git last evening). For more info, please see >>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit >>>>>> test passes simply because the pair object is not null, but fails when >>>>>> trying to extract any information from the pair itself (toString(), >>>>>> getIdenticals(), etc). The substitution matrix file is from >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of >>>>>> pairwise alignments, which do not all fail, but most do with this same >>>>>> error. >>>>>> >>>>>> Thanks, >>>>>> Chris >>>>>> >>>>>> -- >>>>>> PhD Candidate, Integrative Life Sciences >>>>>> Virginia Commonwealth University >>>>>> Richmond, VA >>>>>> >>>>>> _______________________________________________ >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >>> >>> >>> -- >>> ----------------------------------------------------------------------- >>> Dr. Andreas Prlic >>> Senior Scientist, RCSB PDB Protein Data Bank >>> University of California, San Diego >>> (+1) 858.246.0526 >>> ----------------------------------------------------------------------- >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> >> -- >> PhD Candidate, Integrative Life Sciences >> Virginia Commonwealth University >> Richmond, VA >> > > > > -- > PhD Candidate, Integrative Life Sciences > Virginia Commonwealth University > Richmond, VA -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From willishf at ufl.edu Mon Dec 6 15:00:18 2010 From: willishf at ufl.edu (Scooter Willis) Date: Mon, 6 Dec 2010 15:00:18 -0500 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: <91FB5471-89AD-463F-AB1F-9D93F5EA46C7@ebi.ac.uk> References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> <91FB5471-89AD-463F-AB1F-9D93F5EA46C7@ebi.ac.uk> Message-ID: It would be nice to have a cool indexing system that allowed dynamic indexes of the data model but not worth the headache. If we are going to go big we should use the same gap symbols that were added for protein sequences. Scooter On Mon, Dec 6, 2010 at 2:32 PM, Andy Yates wrote: > I would say partially an oversight on my part & partially done on purpose > (a gap is not a nucleotide after all). However I'm all in favour of being > pragmatic here so lets add them in. If I get an okay from the relevant > parties I'll commit the change in. > > Andy > > On 6 Dec 2010, at 18:41, Chris Friedline wrote: > > > OK, so here's a quick fix now that I know where to look. In my local > > source I added the following line to the constructor of DNACompoundSet > > and recompiled. > > > > addNucleotideCompound("-", "-"); > > > > Not sure if this is the correct place for it in terms of what the devs > > want to do globally, but it gets me moving forward again. Gap > > characters are in AminoAcidCompoundSet so I'm wondering if this was > > just a tiny oversight on the nucleotide front. > > > > Thanks again for the help everyone, > > Chris > > > > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline > wrote: > >> That does help, thanks. However, when calling getAsList() on the > >> aligned sequences and printing, this is what I see. Something seems > >> wrong. It does appear as though null is being inserted where there > >> should be gaps > >> > >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, > >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, > >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, > >> null, null, null, null, null, null] > >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, > >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, > >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, > >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] > >> > >> Chris > >> > >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic > wrote: > >>> Hi Andy, > >>> > >>> Check out the SimpleAlignedSequence class, for how Gaps are handled... > >>> Does that help? > >>> > >>> Andreas > >>> > >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: > >>>> So myself & Chris have discussed this off list & we believe it's > because of a NULL compound element in the Sequence given to the > SequenceMixin method. > >>>> > >>>> Does anyone on list know how the AlignedSequence code encodes gaps & > the alike? > >>>> > >>>> Andy > >>>> > >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote: > >>>> > >>>>> Hi Chris, > >>>>> > >>>>> Well that's going into my toStringBuilder() method & that particular > line is concerned with asking a compound for its String representation. How > often do we get nulls in our Sequences and how to deal with them. After all > the Sequence AGTCNULLAGTC is probably more harmful then helpful > >>>>> > >>>>> Andy > >>>>> > >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> Found another potential error case, this time in beta2 (fresh pull > >>>>>> from git last evening). For more info, please see > >>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit > >>>>>> test passes simply because the pair object is not null, but fails > when > >>>>>> trying to extract any information from the pair itself (toString(), > >>>>>> getIdenticals(), etc). The substitution matrix file is from > >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of > >>>>>> pairwise alignments, which do not all fail, but most do with this > same > >>>>>> error. > >>>>>> > >>>>>> Thanks, > >>>>>> Chris > >>>>>> > >>>>>> -- > >>>>>> PhD Candidate, Integrative Life Sciences > >>>>>> Virginia Commonwealth University > >>>>>> Richmond, VA > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>>> -- > >>>> Andrew Yates Ensembl Genomes Engineer > >>>> EMBL-EBI Tel: +44-(0)1223-492538 > >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>> > >>> > >>> > >>> -- > >>> ----------------------------------------------------------------------- > >>> Dr. Andreas Prlic > >>> Senior Scientist, RCSB PDB Protein Data Bank > >>> University of California, San Diego > >>> (+1) 858.246.0526 > >>> ----------------------------------------------------------------------- > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> > >> > >> > >> > >> -- > >> PhD Candidate, Integrative Life Sciences > >> Virginia Commonwealth University > >> Richmond, VA > >> > > > > > > > > -- > > PhD Candidate, Integrative Life Sciences > > Virginia Commonwealth University > > Richmond, VA > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From ayates at ebi.ac.uk Wed Dec 8 04:06:32 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 8 Dec 2010 09:06:32 +0000 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> <91FB5471-89AD-463F-AB1F-9D93F5EA46C7@ebi.ac.uk> Message-ID: <771AA57F-8AD4-4E9E-A1F0-EA44FA2518F2@ebi.ac.uk> I've added the gap symbol to DNA & RNA compound sets. Hopefully this error will go away. If not then we'll have to look into the alignment code & get it to use the gap symbol Andy On 6 Dec 2010, at 20:00, Scooter Willis wrote: > It would be nice to have a cool indexing system that allowed dynamic indexes of the data model but not worth the headache. If we are going to go big we should use the same gap symbols that were added for protein sequences. > > Scooter > > On Mon, Dec 6, 2010 at 2:32 PM, Andy Yates wrote: > I would say partially an oversight on my part & partially done on purpose (a gap is not a nucleotide after all). However I'm all in favour of being pragmatic here so lets add them in. If I get an okay from the relevant parties I'll commit the change in. > > Andy > > On 6 Dec 2010, at 18:41, Chris Friedline wrote: > > > OK, so here's a quick fix now that I know where to look. In my local > > source I added the following line to the constructor of DNACompoundSet > > and recompiled. > > > > addNucleotideCompound("-", "-"); > > > > Not sure if this is the correct place for it in terms of what the devs > > want to do globally, but it gets me moving forward again. Gap > > characters are in AminoAcidCompoundSet so I'm wondering if this was > > just a tiny oversight on the nucleotide front. > > > > Thanks again for the help everyone, > > Chris > > > > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline wrote: > >> That does help, thanks. However, when calling getAsList() on the > >> aligned sequences and printing, this is what I see. Something seems > >> wrong. It does appear as though null is being inserted where there > >> should be gaps > >> > >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, > >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, > >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, > >> null, null, null, null, null, null] > >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, > >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, > >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, > >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] > >> > >> Chris > >> > >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic wrote: > >>> Hi Andy, > >>> > >>> Check out the SimpleAlignedSequence class, for how Gaps are handled... > >>> Does that help? > >>> > >>> Andreas > >>> > >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: > >>>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. > >>>> > >>>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike? > >>>> > >>>> Andy > >>>> > >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote: > >>>> > >>>>> Hi Chris, > >>>>> > >>>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful > >>>>> > >>>>> Andy > >>>>> > >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> Found another potential error case, this time in beta2 (fresh pull > >>>>>> from git last evening). For more info, please see > >>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit > >>>>>> test passes simply because the pair object is not null, but fails when > >>>>>> trying to extract any information from the pair itself (toString(), > >>>>>> getIdenticals(), etc). The substitution matrix file is from > >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of > >>>>>> pairwise alignments, which do not all fail, but most do with this same > >>>>>> error. > >>>>>> > >>>>>> Thanks, > >>>>>> Chris > >>>>>> > >>>>>> -- > >>>>>> PhD Candidate, Integrative Life Sciences > >>>>>> Virginia Commonwealth University > >>>>>> Richmond, VA > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>>> -- > >>>> Andrew Yates Ensembl Genomes Engineer > >>>> EMBL-EBI Tel: +44-(0)1223-492538 > >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>> > >>> > >>> > >>> -- > >>> ----------------------------------------------------------------------- > >>> Dr. Andreas Prlic > >>> Senior Scientist, RCSB PDB Protein Data Bank > >>> University of California, San Diego > >>> (+1) 858.246.0526 > >>> ----------------------------------------------------------------------- > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> > >> > >> > >> > >> -- > >> PhD Candidate, Integrative Life Sciences > >> Virginia Commonwealth University > >> Richmond, VA > >> > > > > > > > > -- > > PhD Candidate, Integrative Life Sciences > > Virginia Commonwealth University > > Richmond, VA > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From jayunit100 at gmail.com Sun Dec 12 17:42:09 2010 From: jayunit100 at gmail.com (Jay Vyas) Date: Sun, 12 Dec 2010 17:42:09 -0500 Subject: [Biojava-l] predicting atoms from backbone Message-ID: Hi guys : Im trying to add C, O , and N atoms to a protein structure, when I only have a backbone trace.. That is, I have a series of CA atoms in a backbone of a protein structure, and I want to generate other atom coordinates. I know biojava can add hydrogents... Anybody have any ideas about how to add and predict other atoms? -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Mon Dec 13 10:56:12 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 13 Dec 2010 07:56:12 -0800 Subject: [Biojava-l] predicting atoms from backbone In-Reply-To: References: Message-ID: Hi Jay, I guess you want to do something like that: http://peds.oxfordjournals.org/content/5/2/147.abstract Although all the tools for such calculations are available in BioJava, there is currently no simple method call that allows to calculate such a mainchain-model. Before doing any coding, probably best to check out other software like COOT, if it can do that... Andreas On Sun, Dec 12, 2010 at 2:42 PM, Jay Vyas wrote: > Hi guys : > > Im trying to add C, O , and N atoms to a protein structure, when I only have > a backbone trace.. > That is, I have a series of CA atoms in a backbone of a protein structure, > and I want to generate other atom coordinates. > > I know biojava can add hydrogents... Anybody have any ideas about how to add > and predict other atoms? > > -- > Jay Vyas > MMSB/UCHC > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From wuuter at gmail.com Wed Dec 15 21:45:42 2010 From: wuuter at gmail.com (Fico) Date: Thu, 16 Dec 2010 10:45:42 +0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file Message-ID: Hi, dear all: I use biojava3 beta1 to parse the PDB files recently, my program is: PDBFileReader pdbreader = new PDBFileReader(); pdbreader.setAutoFetch(false); pdbreader.setPath(pdbDirPath); FileParsingParameters params = new FileParsingParameters(); params.setLoadChemCompInfo(*false*); params.setHeaderOnly(*false*); params.setAlignSeqRes(*true*); params.setParseSecStruc(*false*); pdbreader.setFileParsingParameters(params); Structure structure = null; try { structure = pdbreader.getStructure(pdbDirPath + "\\" + file); } catch (IOException e) { e.printStackTrace(); } when I execute this program, it will download something such as: *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp downloading http://www.rcsb.org/pdb/files/ligand/35G.cif downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* but I do not want to lownload those stuff, How can I cancel it? Thanks. From marcelito_tony at hotmail.com Wed Dec 15 22:59:11 2010 From: marcelito_tony at hotmail.com (marcelo rodriguez) Date: Thu, 16 Dec 2010 03:59:11 +0000 Subject: [Biojava-l] Is there cvtree algorithm in biojava? In-Reply-To: References: Message-ID: I am investigating cvtree algorithm biojava? i need help regards marcelo From darnells at dnastar.com Thu Dec 16 11:26:51 2010 From: darnells at dnastar.com (Steve Darnell) Date: Thu, 16 Dec 2010 10:26:51 -0600 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References: Message-ID: The SeqRes to Atom record alignment forces the use of chemical components to translate non-standard residues to their closest standard counterpart for the sequence alignment. I have to disable setLoadChemCompInfo and setAlignSeqRes when I don't want to download chemical component files from RCSB when parsing a PDB file. Regards, Steve -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico Sent: Wednesday, December 15, 2010 8:46 PM To: Biojava-l at lists.open-bio.org Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file Hi, dear all: I use biojava3 beta1 to parse the PDB files recently, my program is: PDBFileReader pdbreader = new PDBFileReader(); pdbreader.setAutoFetch(false); pdbreader.setPath(pdbDirPath); FileParsingParameters params = new FileParsingParameters(); params.setLoadChemCompInfo(*false*); params.setHeaderOnly(*false*); params.setAlignSeqRes(*true*); params.setParseSecStruc(*false*); pdbreader.setFileParsingParameters(params); Structure structure = null; try { structure = pdbreader.getStructure(pdbDirPath + "\\" + file); } catch (IOException e) { e.printStackTrace(); } when I execute this program, it will download something such as: *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp downloading http://www.rcsb.org/pdb/files/ligand/35G.cif downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* but I do not want to lownload those stuff, How can I cancel it? Thanks. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Fri Dec 17 10:24:16 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 17 Dec 2010 07:24:16 -0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References: Message-ID: ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes set to true and it will not download chemical components if loadChemComp is false. The drawback is that the data representation will not be as precise. Andreas On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell wrote: > The SeqRes to Atom record alignment forces the use of chemical > components to translate non-standard residues to their closest standard > counterpart for the sequence alignment. ?I have to disable > setLoadChemCompInfo and setAlignSeqRes when I don't want to download > chemical component files from RCSB when parsing a PDB file. > > Regards, > Steve > > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico > Sent: Wednesday, December 15, 2010 8:46 PM > To: Biojava-l at lists.open-bio.org > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB > file > > Hi, dear all: > > I use biojava3 beta1 to parse the PDB files recently, my program is: > > ? ? ? ? ? ?PDBFileReader pdbreader = new PDBFileReader(); > ? ? ? ? ? ?pdbreader.setAutoFetch(false); > ? ? ? ? ? ?pdbreader.setPath(pdbDirPath); > > ? ? ? ? ? ?FileParsingParameters params = new FileParsingParameters(); > ? ? ? ? ? ?params.setLoadChemCompInfo(*false*); > ? ? ? ? ? ?params.setHeaderOnly(*false*); > ? ? ? ? ? ?params.setAlignSeqRes(*true*); > ? ? ? ? ? ?params.setParseSecStruc(*false*); > ? ? ? ? ? ?pdbreader.setFileParsingParameters(params); > > ? ? ? ? ? ?Structure structure = null; > ? ? ? ? ? ?try { > ? ? ? ? ? ? ? ?structure = pdbreader.getStructure(pdbDirPath + "\\" + > file); > ? ? ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ? ? ?e.printStackTrace(); > ? ? ? ? ? ?} > > when I execute this program, it will download something such as: > > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* > > but I do not want to lownload those stuff, How can I cancel it? > Thanks. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From dasarnow at gmail.com Sat Dec 18 22:36:28 2010 From: dasarnow at gmail.com (Daniel Asarnow) Date: Sat, 18 Dec 2010 19:36:28 -0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References:

Message-ID: Prior to this, did setting loadChemComp true add processing overhead if setAlignSeqRes is also true? I.e. what's the difference between AlignSeqRes with and without loadChemComp? Just want to know what the right flags are when one wants accurate SEQRES <---> ATOM alignments but isn't otherwise using the components info... On a related note, I ended up writing a class that loaded and discarded Structure objects for my PDBs, to trigger all the downloads before my big processing jobs. Though I guess the right (non-lazy) thing to do is parse out the combined components library to the individual files. -da On Fri, Dec 17, 2010 at 07:24, Andreas Prlic wrote: > ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes > set to true and it will not download chemical components if > loadChemComp is false. The drawback is that the data representation > will not be as precise. > > Andreas > > > > On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell > wrote: > > The SeqRes to Atom record alignment forces the use of chemical > > components to translate non-standard residues to their closest standard > > counterpart for the sequence alignment. I have to disable > > setLoadChemCompInfo and setAlignSeqRes when I don't want to download > > chemical component files from RCSB when parsing a PDB file. > > > > Regards, > > Steve > > > > -----Original Message----- > > From: biojava-l-bounces at lists.open-bio.org > > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico > > Sent: Wednesday, December 15, 2010 8:46 PM > > To: Biojava-l at lists.open-bio.org > > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB > > file > > > > Hi, dear all: > > > > I use biojava3 beta1 to parse the PDB files recently, my program is: > > > > PDBFileReader pdbreader = new PDBFileReader(); > > pdbreader.setAutoFetch(false); > > pdbreader.setPath(pdbDirPath); > > > > FileParsingParameters params = new FileParsingParameters(); > > params.setLoadChemCompInfo(*false*); > > params.setHeaderOnly(*false*); > > params.setAlignSeqRes(*true*); > > params.setParseSecStruc(*false*); > > pdbreader.setFileParsingParameters(params); > > > > Structure structure = null; > > try { > > structure = pdbreader.getStructure(pdbDirPath + "\\" + > > file); > > } catch (IOException e) { > > e.printStackTrace(); > > } > > > > when I execute this program, it will download something such as: > > > > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp > > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif > > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* > > > > but I do not want to lownload those stuff, How can I cancel it? > > Thanks. > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From dasarnow at gmail.com Sat Dec 18 22:47:27 2010 From: dasarnow at gmail.com (Daniel Asarnow) Date: Sat, 18 Dec 2010 19:47:27 -0800 Subject: [Biojava-l] Thread safety of the AtomCache Message-ID: Is AtomCache expected to be thread-safe? It would be nice to have multiple threads work off the same AtomCache, when those threads might each ask for some of the same PDB IDs. Best, -da From andreas at sdsc.edu Sun Dec 19 03:19:40 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 19 Dec 2010 00:19:40 -0800 Subject: [Biojava-l] Thread safety of the AtomCache In-Reply-To: References: Message-ID: Hi Daniel, Yes, the AtomCache is thread-safe. Caching across multiple threads was one of the ideas behind this class... Andreas On Sat, Dec 18, 2010 at 7:47 PM, Daniel Asarnow wrote: > Is AtomCache expected to be thread-safe? ?It would be nice to have multiple > threads work off the same AtomCache, when those threads might each ask for > some of the same PDB IDs. > > Best, > > -da > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Sun Dec 19 03:38:08 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 19 Dec 2010 00:38:08 -0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References:

Message-ID: Hi Daniel, The chemical components provide the chemically correct definition of the various groups. There are quite a few chemically modified amino acids in PDB files which can be represented as amino acids, rather than Hetatom groups, based on these definitions. This has an impact on the sequence alignment that is done during the alignSeqRes process. Without the correct representations, those groups would be flagged as "X", or might be missing. Will set up a wiki page which explains all the parsing options in detail. About your 2nd comment. There is now a new ChemCompProvider interface. Perhaps there should be a new implementation of this, which is downloading the file that contains all chemcomps bundled and provide the data from there... Andreas On Sat, Dec 18, 2010 at 7:36 PM, Daniel Asarnow wrote: > Prior to this, did setting loadChemComp true add processing overhead if > setAlignSeqRes is also true? ?I.e. what's the difference between AlignSeqRes > with and without loadChemComp? > Just want to know what the right flags are when one wants accurate SEQRES > <---> ATOM alignments but isn't otherwise using the components info... > On a related note, I ended up writing a class that loaded and discarded > Structure objects for my PDBs, to trigger all the downloads before my big > processing jobs. ?Though I guess the right (non-lazy) thing to do is parse > out the combined components library to the individual files. > -da > > On Fri, Dec 17, 2010 at 07:24, Andreas Prlic wrote: >> >> ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes >> set to true and it will not download chemical components if >> loadChemComp is false. The drawback is that the data representation >> will not be as precise. >> >> Andreas >> >> >> >> On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell >> wrote: >> > The SeqRes to Atom record alignment forces the use of chemical >> > components to translate non-standard residues to their closest standard >> > counterpart for the sequence alignment. ?I have to disable >> > setLoadChemCompInfo and setAlignSeqRes when I don't want to download >> > chemical component files from RCSB when parsing a PDB file. >> > >> > Regards, >> > Steve >> > >> > -----Original Message----- >> > From: biojava-l-bounces at lists.open-bio.org >> > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico >> > Sent: Wednesday, December 15, 2010 8:46 PM >> > To: Biojava-l at lists.open-bio.org >> > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB >> > file >> > >> > Hi, dear all: >> > >> > I use biojava3 beta1 to parse the PDB files recently, my program is: >> > >> > ? ? ? ? ? ?PDBFileReader pdbreader = new PDBFileReader(); >> > ? ? ? ? ? ?pdbreader.setAutoFetch(false); >> > ? ? ? ? ? ?pdbreader.setPath(pdbDirPath); >> > >> > ? ? ? ? ? ?FileParsingParameters params = new FileParsingParameters(); >> > ? ? ? ? ? ?params.setLoadChemCompInfo(*false*); >> > ? ? ? ? ? ?params.setHeaderOnly(*false*); >> > ? ? ? ? ? ?params.setAlignSeqRes(*true*); >> > ? ? ? ? ? ?params.setParseSecStruc(*false*); >> > ? ? ? ? ? ?pdbreader.setFileParsingParameters(params); >> > >> > ? ? ? ? ? ?Structure structure = null; >> > ? ? ? ? ? ?try { >> > ? ? ? ? ? ? ? ?structure = pdbreader.getStructure(pdbDirPath + "\\" + >> > file); >> > ? ? ? ? ? ?} catch (IOException e) { >> > ? ? ? ? ? ? ? ?e.printStackTrace(); >> > ? ? ? ? ? ?} >> > >> > when I execute this program, it will download something such as: >> > >> > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp >> > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif >> > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* >> > >> > but I do not want to lownload those stuff, How can I cancel it? >> > Thanks. >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> >> >> -- From andreas at sdsc.edu Sun Dec 19 09:29:13 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 19 Dec 2010 06:29:13 -0800 Subject: [Biojava-l] biojava3.0-beta4 Message-ID: The fourth beta version for BioJava 3.0 has been released in our Maven repository http://biojava.org/download/maven/ New things: - tons of javadoc improvements - a patch in the protmod module - bug fix regarding automated download of chemcomp files in the structure module A new set of javadoc files is available from: http://www.biojava.org/docs/api_latest/ Andreas From wuuter at gmail.com Mon Dec 20 00:31:24 2010 From: wuuter at gmail.com (Fico) Date: Mon, 20 Dec 2010 13:31:24 +0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References:

Message-ID: now the question of ChemComp download is OK, but I found a new question when I test bioJava3-Beta4, my program fragment: FileParsingParameters params = new FileParsingParameters(); params.setLoadChemCompInfo(false); params.setHeaderOnly(false); // params.setParseCAOnly(true); params.setAlignSeqRes(true); params.setParseSecStruc(false); // loop file for (String file : getPdbFiles()) { PDBFileReader pdbreader = new PDBFileReader(); pdbreader.setAutoFetch(false); pdbreader.setPath(getPdbDir()); pdbreader.setFileParsingParameters(params); // pdbreader.setLoadChemCompInfo(true); Structure struc = null; try { struc = pdbreader.getStructure(getPdbDir() + "\\" + file); } catch (IOException e) { e.printStackTrace(); } String pdbid = struc.getPDBCode(); for (int i = 0; i < struc.nrModels(); i++) { // loop chain for (Chain ch : struc.getModel(i)) { System.out.println(pdbid + ">>>" + ch.getChainID() + ">>>" + ch.getAtomSequence()); System.out.println(pdbid + ">>>" + ch.getChainID() + ">>>" + ch.getSeqResSequence()); // Test the getAtomGroups() and getSeqResGroups() method // List group = ch.getAtomGroups(); List group = ch.getSeqResGroups(); for (Group gp : group) { System.out.println(gp.getResidueNumber() + ":" + gp.getPDBName()); } } } } my test PDB file is 1O1G.pdb, there are 45 modified residues in chain A, when I use .getAtomGroups() I can get all residues' atom information, such as ResidueNumber and PDBName: 797:PHE 798:LEU 799:MET 800:ARG 801:VAL 802:GLU ...... 840:PRO 841:LEU 842:LEU 843:LYS but use .getSeqResGroups(), the last 45 residues will miss some information, such as ResidueNumber and atom coordinate, the output of the program is: 797:PHE 798:LEU null:MET null:ARG null:VAL null:GLU ...... null:PRO null:LEU null:LEU null:LYS In biojava3-Beta1 the two method produce same result just as .getAtomGroups() in Beta4. so is it a bug? P.S. Could we add new method to get all amino acid sequence with modifed residues directly? now both getAtomSequence() and getSeqResSequence() can't do this, if I want get the amino acid sequence with modifed residues, I had to use .getAtomGroups() or .getSeqResGroups() first and then loop each residue to get one letter amino acid sequence. 2010/12/17 Andreas Prlic > ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes > set to true and it will not download chemical components if > loadChemComp is false. The drawback is that the data representation > will not be as precise. > > Andreas > > > > On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell > wrote: > > The SeqRes to Atom record alignment forces the use of chemical > > components to translate non-standard residues to their closest standard > > counterpart for the sequence alignment. I have to disable > > setLoadChemCompInfo and setAlignSeqRes when I don't want to download > > chemical component files from RCSB when parsing a PDB file. > > > > Regards, > > Steve > > > > -----Original Message----- > > From: biojava-l-bounces at lists.open-bio.org > > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico > > Sent: Wednesday, December 15, 2010 8:46 PM > > To: Biojava-l at lists.open-bio.org > > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB > > file > > > > Hi, dear all: > > > > I use biojava3 beta1 to parse the PDB files recently, my program is: > > > > PDBFileReader pdbreader = new PDBFileReader(); > > pdbreader.setAutoFetch(false); > > pdbreader.setPath(pdbDirPath); > > > > FileParsingParameters params = new FileParsingParameters(); > > params.setLoadChemCompInfo(*false*); > > params.setHeaderOnly(*false*); > > params.setAlignSeqRes(*true*); > > params.setParseSecStruc(*false*); > > pdbreader.setFileParsingParameters(params); > > > > Structure structure = null; > > try { > > structure = pdbreader.getStructure(pdbDirPath + "\\" + > > file); > > } catch (IOException e) { > > e.printStackTrace(); > > } > > > > when I execute this program, it will download something such as: > > > > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp > > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif > > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* > > > > but I do not want to lownload those stuff, How can I cancel it? > > Thanks. > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > From andreas at sdsc.edu Tue Dec 21 18:55:39 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 21 Dec 2010 15:55:39 -0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References:

Message-ID: Hi Fico, - you are right, this was a bug (some index was off). I committed a patch for this to SVN. - I also added new behaviour for downloading chem comp files: The default chem comp provider will fetch the components.cif.gz file and extract all definitions into small files, which will be used from then on. - not sure about your last question. That is kind of already possible I believe. You can use the getChemComp method to get the exact definition for a group. Andreas On Sun, Dec 19, 2010 at 9:31 PM, Fico wrote: > now the question of ChemComp download is OK, but I found a new question when > I test bioJava3-Beta4, my program fragment: > > ??????? FileParsingParameters params = new FileParsingParameters(); > ??? ??? params.setLoadChemCompInfo(false); > ??? ??? params.setHeaderOnly(false); > ??? ??? // params.setParseCAOnly(true); > ??? ??? params.setAlignSeqRes(true); > ??? ??? params.setParseSecStruc(false); > > ??? ??? // loop file > ??? ??? for (String file : getPdbFiles()) { > > ??? ??? ??? PDBFileReader pdbreader = new PDBFileReader(); > ??? ??? ??? pdbreader.setAutoFetch(false); > ??? ??? ??? pdbreader.setPath(getPdbDir()); > > ??? ??? ??? pdbreader.setFileParsingParameters(params); > > ??? ??? ??? // pdbreader.setLoadChemCompInfo(true); > ??? ??? ??? Structure struc = null; > ??? ??? ??? try { > ??? ??? ??? ??? struc = pdbreader.getStructure(getPdbDir() + "\\" + file); > ??? ??? ??? } catch (IOException e) { > ??? ??? ??? ??? e.printStackTrace(); > ??? ??? ??? } > > ??? ??? ??? String pdbid = struc.getPDBCode(); > > ??? ??? ??? for (int i = 0; i < struc.nrModels(); i++) { > > ??? ??? ??? ??? // loop chain > ??? ??? ??? ??? for (Chain ch : struc.getModel(i)) { > ??? ??? ??? ??? ??? System.out.println(pdbid + ">>>" + ch.getChainID() + > ">>>" > ??? ??? ??? ??? ??? ??? ??? + ch.getAtomSequence()); > ??? ??? ??? ??? ??? System.out.println(pdbid + ">>>" + ch.getChainID() + > ">>>" > ??? ??? ??? ??? ??? ??? ??? + ch.getSeqResSequence()); > ??? ??? ??? ??? ??? // Test the getAtomGroups() and getSeqResGroups() method > ??????????????????? // List group = ch.getAtomGroups(); > ??? ??? ??? ??? ??? List group = ch.getSeqResGroups(); > ??? ??? ??? ??? ??? for (Group gp : group) { > ??? ??? ??? ??? ??? ??? System.out.println(gp.getResidueNumber() + ":" > ??? ??? ??? ??? ??? ??? ??? ??? + gp.getPDBName()); > ??? ??? ??? ??? ??? } > ??? ??? ??? ??? } > ??? ??? ??? } > ??? ??? } > > my test PDB file is 1O1G.pdb, there are 45 modified residues in chain A, > when I use .getAtomGroups() I can get all residues' atom information, such > as ResidueNumber and PDBName: > 797:PHE > 798:LEU > 799:MET > 800:ARG > 801:VAL > 802:GLU > ...... > 840:PRO > 841:LEU > 842:LEU > 843:LYS > > but use .getSeqResGroups(), the last 45 residues will miss some information, > such as ResidueNumber and atom coordinate, the output of the program is: > 797:PHE > 798:LEU > null:MET > null:ARG > null:VAL > null:GLU > ...... > null:PRO > null:LEU > null:LEU > null:LYS > > In biojava3-Beta1 the two method produce same result just as > .getAtomGroups() in Beta4. so is it a bug? > > P.S. > ??? Could we add new method to get all amino acid sequence with modifed > residues directly? now both getAtomSequence() and getSeqResSequence() can't > do this, if I want get the amino acid sequence with modifed residues, I had > to use .getAtomGroups() or .getSeqResGroups() first and then loop each > residue to get one letter amino acid sequence. > > > > > > 2010/12/17 Andreas Prlic >> >> ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes >> set to true and it will not download chemical components if >> loadChemComp is false. The drawback is that the data representation >> will not be as precise. >> >> Andreas >> >> >> >> On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell >> wrote: >> > The SeqRes to Atom record alignment forces the use of chemical >> > components to translate non-standard residues to their closest standard >> > counterpart for the sequence alignment. ?I have to disable >> > setLoadChemCompInfo and setAlignSeqRes when I don't want to download >> > chemical component files from RCSB when parsing a PDB file. >> > >> > Regards, >> > Steve >> > >> > -----Original Message----- >> > From: biojava-l-bounces at lists.open-bio.org >> > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico >> > Sent: Wednesday, December 15, 2010 8:46 PM >> > To: Biojava-l at lists.open-bio.org >> > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB >> > file >> > >> > Hi, dear all: >> > >> > I use biojava3 beta1 to parse the PDB files recently, my program is: >> > >> > ? ? ? ? ? ?PDBFileReader pdbreader = new PDBFileReader(); >> > ? ? ? ? ? ?pdbreader.setAutoFetch(false); >> > ? ? ? ? ? ?pdbreader.setPath(pdbDirPath); >> > >> > ? ? ? ? ? ?FileParsingParameters params = new FileParsingParameters(); >> > ? ? ? ? ? ?params.setLoadChemCompInfo(*false*); >> > ? ? ? ? ? ?params.setHeaderOnly(*false*); >> > ? ? ? ? ? ?params.setAlignSeqRes(*true*); >> > ? ? ? ? ? ?params.setParseSecStruc(*false*); >> > ? ? ? ? ? ?pdbreader.setFileParsingParameters(params); >> > >> > ? ? ? ? ? ?Structure structure = null; >> > ? ? ? ? ? ?try { >> > ? ? ? ? ? ? ? ?structure = pdbreader.getStructure(pdbDirPath + "\\" + >> > file); >> > ? ? ? ? ? ?} catch (IOException e) { >> > ? ? ? ? ? ? ? ?e.printStackTrace(); >> > ? ? ? ? ? ?} >> > >> > when I execute this program, it will download something such as: >> > >> > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp >> > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif >> > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* >> > >> > but I do not want to lownload those stuff, How can I cancel it? >> > Thanks. >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Tue Dec 28 16:22:57 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 28 Dec 2010 13:22:57 -0800 Subject: [Biojava-l] BioJava 3.0 released Message-ID: BioJava 3.0 has been released and is available from http://biojava.org/wiki/BioJava:Download . BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats, and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language. Over the last year BioJava has undergone a major re-write. It has been modularized into small, re-usable components and a number of new features have been added. The new approach, modeled after the apache commons, minimizes dependencies and allows for easier contribution of new components. At the present the main modules are: biojava3-core: The core module offers the basic tools required for working with biological sequences of various types (DNA, RNA, protein). Besides file parsers for popular file formats it provides efficient data structures for sequence manipulation and serialization. biojava3-genome: The genome module provides support for reading and writing of gtf, gff2, gff3 file formats biojava3-alignment: This module provides implementations for pairwise and multiple sequence alignments (MSA). The implementation for MSA provides a flexible and multi-threaded framework that works in linear space and that, as an option, allows the users to define anchors that are used in the build up of the multiple alignment. biojava3-structure: The 3D protein structure module provides parsers and a data model for working PDB and mmCif files. New features in this release are the implementation of the CE and FATCAT structural alignment algorithms and the support of chemical component definition files, for a chemically and biologically correct representation of modified residues and ligands. biojava3-protmod: The protein modification module can detect more than 200 protein modifications and crosslinks in 3D protein structures. It comes with an XML file and Java data structures to store information about different types of protein modifications collected from PDB, RESID, and PSI-MOD. Not every feature of the BioJava 1.X code base was migrated over to BioJava 3.0. A modularized version of the 1.X sources is available as a new "biojava-legacy" project. Thanks to all contributors for making this release possible. Happy Biojava-ing, Andreas From science.translator at gmail.com Tue Dec 28 18:24:49 2010 From: science.translator at gmail.com (Matthew Busse) Date: Tue, 28 Dec 2010 15:24:49 -0800 Subject: [Biojava-l] QBlast in BioJava3 Message-ID: Hello BioJava community, I'm a biologist/Java newbie, I'm trying to write a program that automates BLAST requests using very short peptide sequences. I downloaded the new BioJava 3 jars, and attempted to write the program, but I quickly ran into a problem. Here's the code I have so far: package com.multiBLAST.model; import org.biojava3.ws.alignment.qblast.NCBIQBlastAlignmentProperties; import org.biojava3.ws.alignment.qblast.NCBIQBlastOutputProperties; import org.biojava3.ws.alignment.qblast.NCBIQBlastService; import org.biojava3.core.sequence.*; public class BLASTExample { public static void main(String [] args) { NCBIQBlastService blaster; NCBIQBlastAlignmentProperties alignmentProperties; String request = ""; final String TESTSEQUENCE = "MAQGTLIRVTPEQPTHAVCV"; try { blaster = new NCBIQBlastService(); alignmentProperties = new NCBIQBlastAlignmentProperties(); alignmentProperties.setBlastProgram("blastp"); alignmentProperties.setBlastDatabase("nr"); request = blaster.sendAlignmentRequest(TESTSEQUENCE, alignmentProperties); } } } Eclipse is telling me that last line of code causes a compiler error: "The type org.biojavax.bio.seq.RichSequence cannot be resolved. It is indirectly referenced from required .class files" I'm not sure exactly what this means, it seems to me that perhaps when this method was updated from biojavax to biojava3, it somewhere still references the RichSequence from biojavax. I can't find a RichSequence class anywhere in the new BioJava3 API, and I can't find the biojavax jars on the legacy download page, so I'm not sure how to proceed. If someone could please provide some assistance, that would be great. Thank you very much, Matthew Busse, PhD From sylvain.foisy at diploide.net Wed Dec 29 12:22:52 2010 From: sylvain.foisy at diploide.net (Sylvain Foisy Ph. D.) Date: Wed, 29 Dec 2010 12:22:52 -0500 Subject: [Biojava-l] QBlast in BioJava3 Message-ID: Hi, I am the author/main culprit for the QBlast code in BJ. I have to fix the problem that you found ASAP to remove the dependency on the old BJ architecture about representing Sequence objects. I'll work on this early next week, as soon as I'll have finish with my grading... I am more of a teacher than a coder nowadays. Best regards Sylvain From andreas at sdsc.edu Wed Dec 29 12:58:43 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 29 Dec 2010 09:58:43 -0800 Subject: [Biojava-l] QBlast in BioJava3 In-Reply-To: References: Message-ID: Thanks Sylvain, Matthew, a workaround until we have a fix for this is to add the http://biojava.org/download/maven/org/biojava/core/1.8/core-1.8.jar from the biojava-legacy project to your classpath. This should allow your example to work... Andreas On Wed, Dec 29, 2010 at 9:22 AM, Sylvain Foisy Ph. D. wrote: > Hi, > > I am the author/main culprit for the QBlast code in BJ. I have to fix the problem that you found ASAP to remove the dependency on the old BJ architecture about representing Sequence objects. I'll work on this early next week, as soon as I'll have finish with my grading... I am more of a teacher than a coder nowadays. > > Best regards > > Sylvain > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From science.translator at gmail.com Wed Dec 29 14:52:13 2010 From: science.translator at gmail.com (Matthew Busse) Date: Wed, 29 Dec 2010 11:52:13 -0800 Subject: [Biojava-l] QBlast in BioJava3 In-Reply-To: References: Message-ID: Thank you everyone for your prompt responses. Adding the core-1.8.jar to the build path took care of the problem. Cheers! Matthew On Wed, Dec 29, 2010 at 9:58 AM, Andreas Prlic wrote: > Thanks Sylvain, > > Matthew, a workaround until we have a fix for this is to add the > http://biojava.org/download/maven/org/biojava/core/1.8/core-1.8.jar > from the biojava-legacy project to your classpath. This should allow > your example to work... > > Andreas > > > On Wed, Dec 29, 2010 at 9:22 AM, Sylvain Foisy Ph. D. > wrote: > > Hi, > > > > I am the author/main culprit for the QBlast code in BJ. I have to fix the > problem that you found ASAP to remove the dependency on the old BJ > architecture about representing Sequence objects. I'll work on this early > next week, as soon as I'll have finish with my grading... I am more of a > teacher than a coder nowadays. > > > > Best regards > > > > Sylvain > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > From science.translator at gmail.com Wed Dec 29 16:01:33 2010 From: science.translator at gmail.com (Matthew Busse) Date: Wed, 29 Dec 2010 13:01:33 -0800 Subject: [Biojava-l] QBlast in BioJava3 In-Reply-To: References: Message-ID: Hello Sylvain, et al., I think I may have found another similar issue. Here's my program: package com.multiBLAST.model; import java.io.BufferedReader; import java.io.InputStream; import java.io.InputStreamReader; import java.util.Set; import org.biojava3.ws.alignment.qblast.NCBIQBlastAlignmentProperties; import org.biojava3.ws.alignment.qblast.NCBIQBlastOutputFormat; import org.biojava3.ws.alignment.qblast.NCBIQBlastOutputProperties; import org.biojava3.ws.alignment.qblast.NCBIQBlastService; public class BLASTExample { public static void main(String [] args) { NCBIQBlastService blaster; NCBIQBlastOutputProperties outputProperties; InputStream is; String request = ""; final String TESTSEQUENCE = "MAQGTLIRVTPEQPTHAVCV"; String rid = new String(); try { blaster = new NCBIQBlastService(); NCBIQBlastAlignmentProperties alignmentProperties = new NCBIQBlastAlignmentProperties(); alignmentProperties.setBlastProgram("blastp"); alignmentProperties.setBlastDatabase("nr"); request = blaster.sendAlignmentRequest(TESTSEQUENCE, alignmentProperties); System.out.println("Trying to get BLAST results for RID " + rid); boolean wasBlasted = false; while (!wasBlasted) { wasBlasted = blaster.isReady(rid, System.currentTimeMillis()); } outputProperties = new NCBIQBlastOutputProperties(); outputProperties.setOutputFormat(NCBIQBlastOutputFormat.TEXT); outputProperties.setAlignmentOutputFormat(NCBIQBlastOutputFormat.PAIRWISE); outputProperties.setDescriptionNumber(10); outputProperties.setAlignmentNumber(10); //to show that output options were followed Set test = outputProperties.getOutputOptions(); for(String str : test) { System.out.println(str); } is = blaster.getAlignmentResults(request, outputProperties); BufferedReader br = new BufferedReader(new InputStreamReader(is)); String line = null; while ((line = br.readLine()) != null) { System.out.println(line); } } catch (Exception ex) { ex.printStackTrace(); } } } When I run it, it throws an exception: java.lang.Exception: The key named PROGRAM is not set in this RemoteQBlastOutputProperties object at org.biojava3.ws.alignment.qblast.NCBIQBlastAlignmentProperties.getAlignmentOption(NCBIQBlastAlignmentProperties.java:173) at org.biojava3.ws.alignment.qblast.NCBIQBlastService.sendActualAlignementRequest(NCBIQBlastService.java:132) at org.biojava3.ws.alignment.qblast.NCBIQBlastService.sendAlignmentRequest(NCBIQBlastService.java:210) at com.multiBLAST.model.BLASTExample.main(BLASTExample.java:30) Because RemoteQBlastOutputProperties is the terminology used in the pre-BJ3 APIs, I'm guessing this is another conversion problem? Or am I missing something else? Many thanks for all your help. Best, Matthew On Wed, Dec 29, 2010 at 9:58 AM, Andreas Prlic wrote: > Thanks Sylvain, > > Matthew, a workaround until we have a fix for this is to add the > http://biojava.org/download/maven/org/biojava/core/1.8/core-1.8.jar > from the biojava-legacy project to your classpath. This should allow > your example to work... > > Andreas > > > On Wed, Dec 29, 2010 at 9:22 AM, Sylvain Foisy Ph. D. > wrote: > > Hi, > > > > I am the author/main culprit for the QBlast code in BJ. I have to fix the > problem that you found ASAP to remove the dependency on the old BJ > architecture about representing Sequence objects. I'll work on this early > next week, as soon as I'll have finish with my grading... I am more of a > teacher than a coder nowadays. > > > > Best regards > > > > Sylvain > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > From sylvain.foisy at diploide.net Wed Dec 29 16:15:57 2010 From: sylvain.foisy at diploide.net (Sylvain Foisy Ph. D.) Date: Wed, 29 Dec 2010 16:15:57 -0500 Subject: [Biojava-l] QBlast in BioJava3 In-Reply-To: References: Message-ID: Hi Matthew, Let me dive in this code again and I'll find the bug. Any rush on this? I am planning to do this early next week/year ;-) You are actually the first reported user of this module beside me! I mostly tested it with nucleotide sequences on the old architecture. Not much work done wince moving to the new code base... Back to you ASAP Best regards Sylvain On 2010-12-29, at 4:01 PM, Matthew Busse wrote: > Hello Sylvain, et al., > > I think I may have found another similar issue. > > Here's my program: > > package com.multiBLAST.model; > > import java.io.BufferedReader; > import java.io.InputStream; > import java.io.InputStreamReader; > import java.util.Set; > > import org.biojava3.ws.alignment.qblast.NCBIQBlastAlignmentProperties; > import org.biojava3.ws.alignment.qblast.NCBIQBlastOutputFormat; > import org.biojava3.ws.alignment.qblast.NCBIQBlastOutputProperties; > import org.biojava3.ws.alignment.qblast.NCBIQBlastService; > > public class BLASTExample { > > public static void main(String [] args) { > > NCBIQBlastService blaster; > NCBIQBlastOutputProperties outputProperties; > InputStream is; > String request = ""; > final String TESTSEQUENCE = "MAQGTLIRVTPEQPTHAVCV"; > String rid = new String(); > > try { > blaster = new NCBIQBlastService(); > NCBIQBlastAlignmentProperties alignmentProperties = new NCBIQBlastAlignmentProperties(); > alignmentProperties.setBlastProgram("blastp"); > alignmentProperties.setBlastDatabase("nr"); > > request = blaster.sendAlignmentRequest(TESTSEQUENCE, alignmentProperties); > > System.out.println("Trying to get BLAST results for RID " + rid); > > boolean wasBlasted = false; > > while (!wasBlasted) { > wasBlasted = blaster.isReady(rid, System.currentTimeMillis()); > } > > outputProperties = new NCBIQBlastOutputProperties(); > outputProperties.setOutputFormat(NCBIQBlastOutputFormat.TEXT); > outputProperties.setAlignmentOutputFormat(NCBIQBlastOutputFormat.PAIRWISE); > outputProperties.setDescriptionNumber(10); > outputProperties.setAlignmentNumber(10); > > //to show that output options were followed > > Set test = outputProperties.getOutputOptions(); > > for(String str : test) { > System.out.println(str); > } > > is = blaster.getAlignmentResults(request, outputProperties); > > BufferedReader br = new BufferedReader(new InputStreamReader(is)); > > String line = null; > > while ((line = br.readLine()) != null) { > System.out.println(line); > } > > } catch (Exception ex) { > ex.printStackTrace(); > } > } > } > > When I run it, it throws an exception: > java.lang.Exception: The key named PROGRAM is not set in this RemoteQBlastOutputProperties object > at org.biojava3.ws.alignment.qblast.NCBIQBlastAlignmentProperties.getAlignmentOption(NCBIQBlastAlignmentProperties.java:173) > at org.biojava3.ws.alignment.qblast.NCBIQBlastService.sendActualAlignementRequest(NCBIQBlastService.java:132) > at org.biojava3.ws.alignment.qblast.NCBIQBlastService.sendAlignmentRequest(NCBIQBlastService.java:210) > at com.multiBLAST.model.BLASTExample.main(BLASTExample.java:30) > > Because RemoteQBlastOutputProperties is the terminology used in the pre-BJ3 APIs, I'm guessing this is another conversion problem? Or am I missing something else? > > Many thanks for all your help. > > Best, > Matthew > > On Wed, Dec 29, 2010 at 9:58 AM, Andreas Prlic wrote: > Thanks Sylvain, > > Matthew, a workaround until we have a fix for this is to add the > http://biojava.org/download/maven/org/biojava/core/1.8/core-1.8.jar > from the biojava-legacy project to your classpath. This should allow > your example to work... > > Andreas > > > On Wed, Dec 29, 2010 at 9:22 AM, Sylvain Foisy Ph. D. > wrote: > > Hi, > > > > I am the author/main culprit for the QBlast code in BJ. I have to fix the problem that you found ASAP to remove the dependency on the old BJ architecture about representing Sequence objects. I'll work on this early next week, as soon as I'll have finish with my grading... I am more of a teacher than a coder nowadays. > > > > Best regards > > > > Sylvain > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > From andreas.prlic at gmail.com Thu Dec 30 13:27:09 2010 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 30 Dec 2010 10:27:09 -0800 Subject: [Biojava-l] BioJava3 - Structure - Bug In-Reply-To: References: Message-ID: Hi Jose, Thanks for spotting this. I disabled the get/setPDBline a while ago since it essentially doubles memory usage. It would be better to have an Atom.toPDB or a FileConvert.toPDB(Atom) class that builds up the String output dynamically.. I will flag the method calls as deprecated. Andreas 2010/12/30 Jos? Augusto Salim : > Hi Andreas, > > I have used BioJava3 and I find some kind of bug on it. When I try to use > the function getPBDline from Atom interface it return a blank string, it > because in PDBFileParser when a new AtomImpl is created no pdbLine is set. > So, I change the code for it works, just but the line: > > atom.setPDBline(line.trim()); > > in PDBFileParser.java > > I'm sending the file for you. > > thanks for BioJava3 it has been very useful for my work. > > best regardless, > > === > Jos? Augusto Salim > Computer Engineer > ------------------------------- > http://www.thezeitgeistmovement.com > http://www.zeitgeistmovie.com > > From philipp.comans at mytum.de Fri Dec 31 12:07:46 2010 From: philipp.comans at mytum.de (Philipp Comans) Date: Fri, 31 Dec 2010 18:07:46 +0100 Subject: [Biojava-l] Error parsing GFF3 file Message-ID: <6D0D6C61-35E4-406A-8554-4C8AB82449F0@mytum.de> Hello everyone, I am trying to parse the file available here: ftp://ftp.jgi-psf.org/pub/JGI_data/Amphimedon_queenslandica/annotation/Aqu1.gff3.gz with the following commands: import java.util.Iterator; import org.biojava3.genome.parsers.gff.FeatureI; import org.biojava3.genome.parsers.gff.FeatureList; import org.biojava3.genome.parsers.gff.GFF3Reader; public class GFFReader3 { public static void main(String[] args) throws Exception { FeatureList features = (FeatureList) GFF3Reader.read("/Users/philipp/Dropbox/IDP/JGI_data/annotation/Smiles.gff3"); Iterator featureIterator = features.iterator(); FeatureI currentFeature = null; while (featureIterator.hasNext()) { currentFeature = featureIterator.next(); System.out.println(currentFeature); } } } The error I get is: 31.12.2010 18:05:10 org.biojava3.genome.parsers.gff.GFF3Reader read INFO: Gff.read(): Reading /Users/philipp/Dropbox/IDP/JGI_data/annotation/Aqu1.gff3 Exception in thread "main" java.lang.IllegalArgumentException: Improper location parameters: (-1864985,746) at org.biojava3.genome.parsers.gff.Location.(Location.java:75) at org.biojava3.genome.parsers.gff.Location.union(Location.java:258) at org.biojava3.genome.parsers.gff.FeatureList.add(FeatureList.java:49) at org.biojava3.genome.parsers.gff.GFF3Reader.read(GFF3Reader.java:59) at GFFReader3.main(GFFReader3.java:11) I find this very strange because the file is a valid GFF document according to http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Is this a bug or am I doing something wrong? Thanks for your help, I wish you a happy New Year! Philipp From willishf at ufl.edu Fri Dec 31 13:21:48 2010 From: willishf at ufl.edu (Scooter Willis) Date: Fri, 31 Dec 2010 13:21:48 -0500 Subject: [Biojava-l] Error parsing GFF3 file In-Reply-To: <6D0D6C61-35E4-406A-8554-4C8AB82449F0@mytum.de> References: <6D0D6C61-35E4-406A-8554-4C8AB82449F0@mytum.de> Message-ID: Phillip I think it is complaining about the negative location (-1864985,746). Is this a circular genome? That seems to be a rather large sequence segment and I think it is correct to complain about the negative location. We tried to plan ahead on circular genomes and genes that cross the boundary begin/end boundary and at the same time not have the programmer brain explode trying to handle all the combinations that exist. It gets really fun when you have a negative strand. One of the challenges of a valid gff3 file is that you can make sure ontology is correct and the file format is correct but when you try and bring it all together to do something with the data(turn it into a protein) you need to check harder. If this is a valid location can you send me the gff3 segment and the DNA sequence that describes the features and I will see what I can do to make it work without previous reference to head exploding. Let me know what the end goal is on parsing gff3 file and what is missing when you try and map to a GeneSequence/ProteinSequence. Thanks Scooter On Fri, Dec 31, 2010 at 12:07 PM, Philipp Comans wrote: > Hello everyone, > > I am trying to parse the file available here: > ftp://ftp.jgi-psf.org/pub/JGI_data/Amphimedon_queenslandica/annotation/Aqu1.gff3.gz > with the following commands: > > import java.util.Iterator; > > import org.biojava3.genome.parsers.gff.FeatureI; > import org.biojava3.genome.parsers.gff.FeatureList; > import org.biojava3.genome.parsers.gff.GFF3Reader; > > public class GFFReader3 { > > public static void main(String[] args) throws Exception { > > FeatureList features = (FeatureList) > GFF3Reader.read("/Users/philipp/Dropbox/IDP/JGI_data/annotation/Smiles.gff3"); > Iterator featureIterator = features.iterator(); > > FeatureI currentFeature = null; > > while (featureIterator.hasNext()) { > currentFeature = featureIterator.next(); > System.out.println(currentFeature); > } > > } > > } > > The error I get is: > 31.12.2010 18:05:10 org.biojava3.genome.parsers.gff.GFF3Reader read > INFO: Gff.read(): Reading > /Users/philipp/Dropbox/IDP/JGI_data/annotation/Aqu1.gff3 > Exception in thread "main" java.lang.IllegalArgumentException: Improper > location parameters: (-1864985,746) > at org.biojava3.genome.parsers.gff.Location.(Location.java:75) > at org.biojava3.genome.parsers.gff.Location.union(Location.java:258) > at > org.biojava3.genome.parsers.gff.FeatureList.add(FeatureList.java:49) > at > org.biojava3.genome.parsers.gff.GFF3Reader.read(GFF3Reader.java:59) > at GFFReader3.main(GFFReader3.java:11) > > I find this very strange because the file is a valid GFF document according > to > http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online > > Is this a bug or am I doing something wrong? > Thanks for your help, I wish you a happy New Year! > > Philipp > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From Michael.Rusch at STJUDE.ORG Wed Dec 1 18:37:00 2010 From: Michael.Rusch at STJUDE.ORG (Rusch, Michael) Date: Wed, 1 Dec 2010 12:37:00 -0600 Subject: [Biojava-l] newbie question about gff Message-ID: Newbie question: I want to read in a GFF file and process it taking full advantage of the gene model (e.g. exons belong to a transcript). I see that I can read the GFF into an EntrySet, and then annotate a Sequence with the EntrySet, but I don't have a sequence to start off with. I could read in sequence from a FASTA file, but since I'm not actually using sequence, and I'm dealing with human chromosomes, it seems quite wasteful to do so. I could also just use the EntrySet, but then I can't take advantage of the gene model. Is there a way to get a Sequence object with no sequence, just features from a GFF file? Thanks, Michael ________________________________ Email Disclaimer: www.stjude.org/emaildisclaimer From darnells at dnastar.com Fri Dec 3 23:35:21 2010 From: darnells at dnastar.com (Steve Darnell) Date: Fri, 3 Dec 2010 17:35:21 -0600 Subject: [Biojava-l] PDBFileParser question using PDBID 470D In-Reply-To: References: Message-ID: Andreas, I've been using biojava to gather sequence data from structure files for an internal project. My intent was to test the limitations of my work (hence files similar to 470D), but came across this behavior in biojava. It is not critical to obtain this particular mapping since it can be derived from the atom records. However, I didn't understand why the SEQRES list would be empty and was looking for clarification. Is it because the chain is RNA and the empty list prevents the unsupported alignment of RNA records? Regards, Steve -----Original Message----- From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic Sent: Monday, November 29, 2010 6:36 PM To: Steve Darnell Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] PDBFileParser question using PDBID 470D Hi Steve, as you already are saying, this is an "exotic" sequence, in the sense that this is an RNA. The alignments of the SEQRES records for RNA currently is not supported as of yet. Can you explain a bit more what you are doing and why you need this mapping in this case? Thanks, Andreas On Mon, Nov 29, 2010 at 12:51 PM, Steve Darnell wrote: > Greetings, > > After parsing PDBID 470D with biojava-3.0-alpha5, Chain A returns an > empty SEQRES sequence (Chain.getSeqResSequence) and empty SEQRES group > list (Chain.getSeqResGroups) but the one-letter ATOM sequence is > properly translated and the ATOM group list contains the appropriate > number of groups (LoadChemCompInfo set to true). > > This is an exotic sequence, but my expectation is that the SEQRES group > list would have members in it (and one-letter sequence translated if > LoadChemCompInfo is true). ?Am I mistaken and the current behavior is > the intended result? > > Best regards, > Steve Darnell > > -- > SEQRES records exist in 470D: > > SEQRES ? 1 A ? 12 ?C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 > > SEQRES ? 1 B ? 12 ?C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 > > > > Sample println output (ln 1 record type, ln 2 get${TYPE}Sequence, ln 3 > get${TYPE}Groups): > > SEQRES > '' > [] > > ATOM > 'CGCGAAUUCGCG' > [PDB: C43 1 trueatoms: 21, PDB: G48 2 trueatoms: 27, PDB: C43 3 > trueatoms: 24, ...] > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Sat Dec 4 17:46:08 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 4 Dec 2010 09:46:08 -0800 Subject: [Biojava-l] PDBFileParser question using PDBID 470D In-Reply-To: References: Message-ID: Hi Steve, > I've been using biojava to gather sequence data from structure files for an internal project. ?My intent was to test the limitations of my work (hence files similar to 470D), but came across this behavior in biojava. ok > It is not critical to obtain this particular mapping since it can be derived from the atom records. ?However, I didn't understand why the SEQRES list would be empty and was looking for clarification. ?Is it because the chain is RNA and the empty list prevents the unsupported alignment of RNA records? When working with PDB files the list gets built up after the alignment. Since RNA alignment is not supported by the parser, the list can't get created... In principle mmCif files contain the info how to join SEQRES and ATOM groups correctly and no alignment is needed. I will take a look again, how this works in this case... Andreas > -----Original Message----- > From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic > Sent: Monday, November 29, 2010 6:36 PM > To: Steve Darnell > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] PDBFileParser question using PDBID 470D > > Hi Steve, > > as you already are saying, this is an "exotic" sequence, in the sense > that this is an RNA. The alignments of the SEQRES records for RNA > currently is not supported as of yet. Can you explain a bit more what > you are doing and why you need this mapping in this case? > > Thanks, > Andreas > > On Mon, Nov 29, 2010 at 12:51 PM, Steve Darnell wrote: >> Greetings, >> >> After parsing PDBID 470D with biojava-3.0-alpha5, Chain A returns an >> empty SEQRES sequence (Chain.getSeqResSequence) and empty SEQRES group >> list (Chain.getSeqResGroups) but the one-letter ATOM sequence is >> properly translated and the ATOM group list contains the appropriate >> number of groups (LoadChemCompInfo set to true). >> >> This is an exotic sequence, but my expectation is that the SEQRES group >> list would have members in it (and one-letter sequence translated if >> LoadChemCompInfo is true). ?Am I mistaken and the current behavior is >> the intended result? >> >> Best regards, >> Steve Darnell >> >> -- >> SEQRES records exist in 470D: >> >> SEQRES ? 1 A ? 12 ?C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 >> >> SEQRES ? 1 B ? 12 ?C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48 >> >> >> >> Sample println output (ln 1 record type, ln 2 get${TYPE}Sequence, ln 3 >> get${TYPE}Groups): >> >> SEQRES >> '' >> [] >> >> ATOM >> 'CGCGAAUUCGCG' >> [PDB: C43 1 trueatoms: 21, PDB: G48 2 trueatoms: 27, PDB: C43 3 >> trueatoms: 24, ...] >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From cfriedline at vcu.edu Mon Dec 6 12:45:38 2010 From: cfriedline at vcu.edu (Chris Friedline) Date: Mon, 6 Dec 2010 07:45:38 -0500 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment Message-ID: Hello, Found another potential error case, this time in beta2 (fresh pull from git last evening). ?For more info, please see http://pastie.org/1351388 for test case and stack trace. ?The JUnit test passes simply because the pair object is not null, but fails when trying to extract any information from the pair itself (toString(), getIdenticals(), etc). The substitution matrix file is from ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of pairwise alignments, which do not all fail, but most do with this same error. Thanks, Chris -- PhD Candidate, Integrative Life Sciences Virginia Commonwealth University Richmond, VA From ayates at ebi.ac.uk Mon Dec 6 13:50:20 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 6 Dec 2010 13:50:20 +0000 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: Message-ID: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> Hi Chris, Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful Andy On 6 Dec 2010, at 12:45, Chris Friedline wrote: > Hello, > > Found another potential error case, this time in beta2 (fresh pull > from git last evening). For more info, please see > http://pastie.org/1351388 for test case and stack trace. The JUnit > test passes simply because the pair object is not null, but fails when > trying to extract any information from the pair itself (toString(), > getIdenticals(), etc). The substitution matrix file is from > ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of > pairwise alignments, which do not all fail, but most do with this same > error. > > Thanks, > Chris > > -- > PhD Candidate, Integrative Life Sciences > Virginia Commonwealth University > Richmond, VA > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From ayates at ebi.ac.uk Mon Dec 6 15:13:49 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 6 Dec 2010 15:13:49 +0000 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> Message-ID: <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. Does anyone on list know how the AlignedSequence code encodes gaps & the alike? Andy On 6 Dec 2010, at 13:50, Andy Yates wrote: > Hi Chris, > > Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful > > Andy > > On 6 Dec 2010, at 12:45, Chris Friedline wrote: > >> Hello, >> >> Found another potential error case, this time in beta2 (fresh pull >> from git last evening). For more info, please see >> http://pastie.org/1351388 for test case and stack trace. The JUnit >> test passes simply because the pair object is not null, but fails when >> trying to extract any information from the pair itself (toString(), >> getIdenticals(), etc). The substitution matrix file is from >> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of >> pairwise alignments, which do not all fail, but most do with this same >> error. >> >> Thanks, >> Chris >> >> -- >> PhD Candidate, Integrative Life Sciences >> Virginia Commonwealth University >> Richmond, VA >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas at sdsc.edu Mon Dec 6 17:22:55 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 6 Dec 2010 09:22:55 -0800 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> Message-ID: Hi Andy, Check out the SimpleAlignedSequence class, for how Gaps are handled... Does that help? Andreas On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: > So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. > > Does anyone on list know how the AlignedSequence code encodes gaps & the alike? > > Andy > > On 6 Dec 2010, at 13:50, Andy Yates wrote: > >> Hi Chris, >> >> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful >> >> Andy >> >> On 6 Dec 2010, at 12:45, Chris Friedline wrote: >> >>> Hello, >>> >>> Found another potential error case, this time in beta2 (fresh pull >>> from git last evening). ?For more info, please see >>> http://pastie.org/1351388 for test case and stack trace. ?The JUnit >>> test passes simply because the pair object is not null, but fails when >>> trying to extract any information from the pair itself (toString(), >>> getIdenticals(), etc). The substitution matrix file is from >>> ftp://ftp.ncbi.nih.gov/blast/matrices. ?I'm doing large numbers of >>> pairwise alignments, which do not all fail, but most do with this same >>> error. >>> >>> Thanks, >>> Chris >>> >>> -- >>> PhD Candidate, Integrative Life Sciences >>> Virginia Commonwealth University >>> Richmond, VA >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From cfriedline at vcu.edu Mon Dec 6 18:28:37 2010 From: cfriedline at vcu.edu (Chris Friedline) Date: Mon, 6 Dec 2010 13:28:37 -0500 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> Message-ID: That does help, thanks. However, when calling getAsList() on the aligned sequences and printing, this is what I see. Something seems wrong. It does appear as though null is being inserted where there should be gaps seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, null, null, null, null, null, null] seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] Chris On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic wrote: > Hi Andy, > > Check out the SimpleAlignedSequence class, for how Gaps are handled... > Does that help? > > Andreas > > On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: >> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. >> >> Does anyone on list know how the AlignedSequence code encodes gaps & the alike? >> >> Andy >> >> On 6 Dec 2010, at 13:50, Andy Yates wrote: >> >>> Hi Chris, >>> >>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful >>> >>> Andy >>> >>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: >>> >>>> Hello, >>>> >>>> Found another potential error case, this time in beta2 (fresh pull >>>> from git last evening). ?For more info, please see >>>> http://pastie.org/1351388 for test case and stack trace. ?The JUnit >>>> test passes simply because the pair object is not null, but fails when >>>> trying to extract any information from the pair itself (toString(), >>>> getIdenticals(), etc). The substitution matrix file is from >>>> ftp://ftp.ncbi.nih.gov/blast/matrices. ?I'm doing large numbers of >>>> pairwise alignments, which do not all fail, but most do with this same >>>> error. >>>> >>>> Thanks, >>>> Chris >>>> >>>> -- >>>> PhD Candidate, Integrative Life Sciences >>>> Virginia Commonwealth University >>>> Richmond, VA >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- PhD Candidate, Integrative Life Sciences Virginia Commonwealth University Richmond, VA From cfriedline at vcu.edu Mon Dec 6 18:41:17 2010 From: cfriedline at vcu.edu (Chris Friedline) Date: Mon, 6 Dec 2010 13:41:17 -0500 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> Message-ID: OK, so here's a quick fix now that I know where to look. In my local source I added the following line to the constructor of DNACompoundSet and recompiled. addNucleotideCompound("-", "-"); Not sure if this is the correct place for it in terms of what the devs want to do globally, but it gets me moving forward again. Gap characters are in AminoAcidCompoundSet so I'm wondering if this was just a tiny oversight on the nucleotide front. Thanks again for the help everyone, Chris On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline wrote: > That does help, thanks. ?However, when calling getAsList() on the > aligned sequences and printing, this is what I see. ?Something seems > wrong. ?It does appear as though null is being inserted where there > should be gaps > > seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, > G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, > T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, > null, null, null, null, null, null] > seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, > G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, > null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, > null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] > > Chris > > On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic wrote: >> Hi Andy, >> >> Check out the SimpleAlignedSequence class, for how Gaps are handled... >> Does that help? >> >> Andreas >> >> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: >>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. >>> >>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike? >>> >>> Andy >>> >>> On 6 Dec 2010, at 13:50, Andy Yates wrote: >>> >>>> Hi Chris, >>>> >>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful >>>> >>>> Andy >>>> >>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: >>>> >>>>> Hello, >>>>> >>>>> Found another potential error case, this time in beta2 (fresh pull >>>>> from git last evening). ?For more info, please see >>>>> http://pastie.org/1351388 for test case and stack trace. ?The JUnit >>>>> test passes simply because the pair object is not null, but fails when >>>>> trying to extract any information from the pair itself (toString(), >>>>> getIdenticals(), etc). The substitution matrix file is from >>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. ?I'm doing large numbers of >>>>> pairwise alignments, which do not all fail, but most do with this same >>>>> error. >>>>> >>>>> Thanks, >>>>> Chris >>>>> >>>>> -- >>>>> PhD Candidate, Integrative Life Sciences >>>>> Virginia Commonwealth University >>>>> Richmond, VA >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >>> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >>> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >>> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- > PhD Candidate, Integrative Life Sciences > Virginia Commonwealth University > Richmond, VA > -- PhD Candidate, Integrative Life Sciences Virginia Commonwealth University Richmond, VA From ayates at ebi.ac.uk Mon Dec 6 19:32:55 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 6 Dec 2010 19:32:55 +0000 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> Message-ID: <91FB5471-89AD-463F-AB1F-9D93F5EA46C7@ebi.ac.uk> I would say partially an oversight on my part & partially done on purpose (a gap is not a nucleotide after all). However I'm all in favour of being pragmatic here so lets add them in. If I get an okay from the relevant parties I'll commit the change in. Andy On 6 Dec 2010, at 18:41, Chris Friedline wrote: > OK, so here's a quick fix now that I know where to look. In my local > source I added the following line to the constructor of DNACompoundSet > and recompiled. > > addNucleotideCompound("-", "-"); > > Not sure if this is the correct place for it in terms of what the devs > want to do globally, but it gets me moving forward again. Gap > characters are in AminoAcidCompoundSet so I'm wondering if this was > just a tiny oversight on the nucleotide front. > > Thanks again for the help everyone, > Chris > > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline wrote: >> That does help, thanks. However, when calling getAsList() on the >> aligned sequences and printing, this is what I see. Something seems >> wrong. It does appear as though null is being inserted where there >> should be gaps >> >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, >> null, null, null, null, null, null] >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] >> >> Chris >> >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic wrote: >>> Hi Andy, >>> >>> Check out the SimpleAlignedSequence class, for how Gaps are handled... >>> Does that help? >>> >>> Andreas >>> >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: >>>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. >>>> >>>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike? >>>> >>>> Andy >>>> >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote: >>>> >>>>> Hi Chris, >>>>> >>>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful >>>>> >>>>> Andy >>>>> >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> Found another potential error case, this time in beta2 (fresh pull >>>>>> from git last evening). For more info, please see >>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit >>>>>> test passes simply because the pair object is not null, but fails when >>>>>> trying to extract any information from the pair itself (toString(), >>>>>> getIdenticals(), etc). The substitution matrix file is from >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of >>>>>> pairwise alignments, which do not all fail, but most do with this same >>>>>> error. >>>>>> >>>>>> Thanks, >>>>>> Chris >>>>>> >>>>>> -- >>>>>> PhD Candidate, Integrative Life Sciences >>>>>> Virginia Commonwealth University >>>>>> Richmond, VA >>>>>> >>>>>> _______________________________________________ >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>>> >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>>> -- >>>> Andrew Yates Ensembl Genomes Engineer >>>> EMBL-EBI Tel: +44-(0)1223-492538 >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> >>> >>> >>> >>> -- >>> ----------------------------------------------------------------------- >>> Dr. Andreas Prlic >>> Senior Scientist, RCSB PDB Protein Data Bank >>> University of California, San Diego >>> (+1) 858.246.0526 >>> ----------------------------------------------------------------------- >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> >> >> -- >> PhD Candidate, Integrative Life Sciences >> Virginia Commonwealth University >> Richmond, VA >> > > > > -- > PhD Candidate, Integrative Life Sciences > Virginia Commonwealth University > Richmond, VA -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From willishf at ufl.edu Mon Dec 6 20:00:18 2010 From: willishf at ufl.edu (Scooter Willis) Date: Mon, 6 Dec 2010 15:00:18 -0500 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: <91FB5471-89AD-463F-AB1F-9D93F5EA46C7@ebi.ac.uk> References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> <91FB5471-89AD-463F-AB1F-9D93F5EA46C7@ebi.ac.uk> Message-ID: It would be nice to have a cool indexing system that allowed dynamic indexes of the data model but not worth the headache. If we are going to go big we should use the same gap symbols that were added for protein sequences. Scooter On Mon, Dec 6, 2010 at 2:32 PM, Andy Yates wrote: > I would say partially an oversight on my part & partially done on purpose > (a gap is not a nucleotide after all). However I'm all in favour of being > pragmatic here so lets add them in. If I get an okay from the relevant > parties I'll commit the change in. > > Andy > > On 6 Dec 2010, at 18:41, Chris Friedline wrote: > > > OK, so here's a quick fix now that I know where to look. In my local > > source I added the following line to the constructor of DNACompoundSet > > and recompiled. > > > > addNucleotideCompound("-", "-"); > > > > Not sure if this is the correct place for it in terms of what the devs > > want to do globally, but it gets me moving forward again. Gap > > characters are in AminoAcidCompoundSet so I'm wondering if this was > > just a tiny oversight on the nucleotide front. > > > > Thanks again for the help everyone, > > Chris > > > > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline > wrote: > >> That does help, thanks. However, when calling getAsList() on the > >> aligned sequences and printing, this is what I see. Something seems > >> wrong. It does appear as though null is being inserted where there > >> should be gaps > >> > >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, > >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, > >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, > >> null, null, null, null, null, null] > >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, > >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, > >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, > >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] > >> > >> Chris > >> > >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic > wrote: > >>> Hi Andy, > >>> > >>> Check out the SimpleAlignedSequence class, for how Gaps are handled... > >>> Does that help? > >>> > >>> Andreas > >>> > >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: > >>>> So myself & Chris have discussed this off list & we believe it's > because of a NULL compound element in the Sequence given to the > SequenceMixin method. > >>>> > >>>> Does anyone on list know how the AlignedSequence code encodes gaps & > the alike? > >>>> > >>>> Andy > >>>> > >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote: > >>>> > >>>>> Hi Chris, > >>>>> > >>>>> Well that's going into my toStringBuilder() method & that particular > line is concerned with asking a compound for its String representation. How > often do we get nulls in our Sequences and how to deal with them. After all > the Sequence AGTCNULLAGTC is probably more harmful then helpful > >>>>> > >>>>> Andy > >>>>> > >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> Found another potential error case, this time in beta2 (fresh pull > >>>>>> from git last evening). For more info, please see > >>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit > >>>>>> test passes simply because the pair object is not null, but fails > when > >>>>>> trying to extract any information from the pair itself (toString(), > >>>>>> getIdenticals(), etc). The substitution matrix file is from > >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of > >>>>>> pairwise alignments, which do not all fail, but most do with this > same > >>>>>> error. > >>>>>> > >>>>>> Thanks, > >>>>>> Chris > >>>>>> > >>>>>> -- > >>>>>> PhD Candidate, Integrative Life Sciences > >>>>>> Virginia Commonwealth University > >>>>>> Richmond, VA > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>>> -- > >>>> Andrew Yates Ensembl Genomes Engineer > >>>> EMBL-EBI Tel: +44-(0)1223-492538 > >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>> > >>> > >>> > >>> -- > >>> ----------------------------------------------------------------------- > >>> Dr. Andreas Prlic > >>> Senior Scientist, RCSB PDB Protein Data Bank > >>> University of California, San Diego > >>> (+1) 858.246.0526 > >>> ----------------------------------------------------------------------- > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> > >> > >> > >> > >> -- > >> PhD Candidate, Integrative Life Sciences > >> Virginia Commonwealth University > >> Richmond, VA > >> > > > > > > > > -- > > PhD Candidate, Integrative Life Sciences > > Virginia Commonwealth University > > Richmond, VA > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From ayates at ebi.ac.uk Wed Dec 8 09:06:32 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 8 Dec 2010 09:06:32 +0000 Subject: [Biojava-l] SequenceMixin Error in BioJava3 Alignment In-Reply-To: References: <610B0600-699D-4249-A6F6-27EBEAEAD585@ebi.ac.uk> <8C833CB3-C110-496B-9285-1A3FE150ED01@ebi.ac.uk> <91FB5471-89AD-463F-AB1F-9D93F5EA46C7@ebi.ac.uk> Message-ID: <771AA57F-8AD4-4E9E-A1F0-EA44FA2518F2@ebi.ac.uk> I've added the gap symbol to DNA & RNA compound sets. Hopefully this error will go away. If not then we'll have to look into the alignment code & get it to use the gap symbol Andy On 6 Dec 2010, at 20:00, Scooter Willis wrote: > It would be nice to have a cool indexing system that allowed dynamic indexes of the data model but not worth the headache. If we are going to go big we should use the same gap symbols that were added for protein sequences. > > Scooter > > On Mon, Dec 6, 2010 at 2:32 PM, Andy Yates wrote: > I would say partially an oversight on my part & partially done on purpose (a gap is not a nucleotide after all). However I'm all in favour of being pragmatic here so lets add them in. If I get an okay from the relevant parties I'll commit the change in. > > Andy > > On 6 Dec 2010, at 18:41, Chris Friedline wrote: > > > OK, so here's a quick fix now that I know where to look. In my local > > source I added the following line to the constructor of DNACompoundSet > > and recompiled. > > > > addNucleotideCompound("-", "-"); > > > > Not sure if this is the correct place for it in terms of what the devs > > want to do globally, but it gets me moving forward again. Gap > > characters are in AminoAcidCompoundSet so I'm wondering if this was > > just a tiny oversight on the nucleotide front. > > > > Thanks again for the help everyone, > > Chris > > > > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline wrote: > >> That does help, thanks. However, when calling getAsList() on the > >> aligned sequences and printing, this is what I see. Something seems > >> wrong. It does appear as though null is being inserted where there > >> should be gaps > >> > >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, > >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, > >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, > >> null, null, null, null, null, null] > >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, > >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, > >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, > >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] > >> > >> Chris > >> > >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic wrote: > >>> Hi Andy, > >>> > >>> Check out the SimpleAlignedSequence class, for how Gaps are handled... > >>> Does that help? > >>> > >>> Andreas > >>> > >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates wrote: > >>>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method. > >>>> > >>>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike? > >>>> > >>>> Andy > >>>> > >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote: > >>>> > >>>>> Hi Chris, > >>>>> > >>>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful > >>>>> > >>>>> Andy > >>>>> > >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> Found another potential error case, this time in beta2 (fresh pull > >>>>>> from git last evening). For more info, please see > >>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit > >>>>>> test passes simply because the pair object is not null, but fails when > >>>>>> trying to extract any information from the pair itself (toString(), > >>>>>> getIdenticals(), etc). The substitution matrix file is from > >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of > >>>>>> pairwise alignments, which do not all fail, but most do with this same > >>>>>> error. > >>>>>> > >>>>>> Thanks, > >>>>>> Chris > >>>>>> > >>>>>> -- > >>>>>> PhD Candidate, Integrative Life Sciences > >>>>>> Virginia Commonwealth University > >>>>>> Richmond, VA > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>>> -- > >>>> Andrew Yates Ensembl Genomes Engineer > >>>> EMBL-EBI Tel: +44-(0)1223-492538 > >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>> > >>> > >>> > >>> -- > >>> ----------------------------------------------------------------------- > >>> Dr. Andreas Prlic > >>> Senior Scientist, RCSB PDB Protein Data Bank > >>> University of California, San Diego > >>> (+1) 858.246.0526 > >>> ----------------------------------------------------------------------- > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> > >> > >> > >> > >> -- > >> PhD Candidate, Integrative Life Sciences > >> Virginia Commonwealth University > >> Richmond, VA > >> > > > > > > > > -- > > PhD Candidate, Integrative Life Sciences > > Virginia Commonwealth University > > Richmond, VA > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From jayunit100 at gmail.com Sun Dec 12 22:42:09 2010 From: jayunit100 at gmail.com (Jay Vyas) Date: Sun, 12 Dec 2010 17:42:09 -0500 Subject: [Biojava-l] predicting atoms from backbone Message-ID: Hi guys : Im trying to add C, O , and N atoms to a protein structure, when I only have a backbone trace.. That is, I have a series of CA atoms in a backbone of a protein structure, and I want to generate other atom coordinates. I know biojava can add hydrogents... Anybody have any ideas about how to add and predict other atoms? -- Jay Vyas MMSB/UCHC From andreas at sdsc.edu Mon Dec 13 15:56:12 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 13 Dec 2010 07:56:12 -0800 Subject: [Biojava-l] predicting atoms from backbone In-Reply-To: References: Message-ID: Hi Jay, I guess you want to do something like that: http://peds.oxfordjournals.org/content/5/2/147.abstract Although all the tools for such calculations are available in BioJava, there is currently no simple method call that allows to calculate such a mainchain-model. Before doing any coding, probably best to check out other software like COOT, if it can do that... Andreas On Sun, Dec 12, 2010 at 2:42 PM, Jay Vyas wrote: > Hi guys : > > Im trying to add C, O , and N atoms to a protein structure, when I only have > a backbone trace.. > That is, I have a series of CA atoms in a backbone of a protein structure, > and I want to generate other atom coordinates. > > I know biojava can add hydrogents... Anybody have any ideas about how to add > and predict other atoms? > > -- > Jay Vyas > MMSB/UCHC > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From wuuter at gmail.com Thu Dec 16 02:45:42 2010 From: wuuter at gmail.com (Fico) Date: Thu, 16 Dec 2010 10:45:42 +0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file Message-ID: Hi, dear all: I use biojava3 beta1 to parse the PDB files recently, my program is: PDBFileReader pdbreader = new PDBFileReader(); pdbreader.setAutoFetch(false); pdbreader.setPath(pdbDirPath); FileParsingParameters params = new FileParsingParameters(); params.setLoadChemCompInfo(*false*); params.setHeaderOnly(*false*); params.setAlignSeqRes(*true*); params.setParseSecStruc(*false*); pdbreader.setFileParsingParameters(params); Structure structure = null; try { structure = pdbreader.getStructure(pdbDirPath + "\\" + file); } catch (IOException e) { e.printStackTrace(); } when I execute this program, it will download something such as: *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp downloading http://www.rcsb.org/pdb/files/ligand/35G.cif downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* but I do not want to lownload those stuff, How can I cancel it? Thanks. From marcelito_tony at hotmail.com Thu Dec 16 03:59:11 2010 From: marcelito_tony at hotmail.com (marcelo rodriguez) Date: Thu, 16 Dec 2010 03:59:11 +0000 Subject: [Biojava-l] Is there cvtree algorithm in biojava? In-Reply-To: References: Message-ID: I am investigating cvtree algorithm biojava? i need help regards marcelo From darnells at dnastar.com Thu Dec 16 16:26:51 2010 From: darnells at dnastar.com (Steve Darnell) Date: Thu, 16 Dec 2010 10:26:51 -0600 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References: Message-ID: The SeqRes to Atom record alignment forces the use of chemical components to translate non-standard residues to their closest standard counterpart for the sequence alignment. I have to disable setLoadChemCompInfo and setAlignSeqRes when I don't want to download chemical component files from RCSB when parsing a PDB file. Regards, Steve -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico Sent: Wednesday, December 15, 2010 8:46 PM To: Biojava-l at lists.open-bio.org Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file Hi, dear all: I use biojava3 beta1 to parse the PDB files recently, my program is: PDBFileReader pdbreader = new PDBFileReader(); pdbreader.setAutoFetch(false); pdbreader.setPath(pdbDirPath); FileParsingParameters params = new FileParsingParameters(); params.setLoadChemCompInfo(*false*); params.setHeaderOnly(*false*); params.setAlignSeqRes(*true*); params.setParseSecStruc(*false*); pdbreader.setFileParsingParameters(params); Structure structure = null; try { structure = pdbreader.getStructure(pdbDirPath + "\\" + file); } catch (IOException e) { e.printStackTrace(); } when I execute this program, it will download something such as: *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp downloading http://www.rcsb.org/pdb/files/ligand/35G.cif downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* but I do not want to lownload those stuff, How can I cancel it? Thanks. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From andreas at sdsc.edu Fri Dec 17 15:24:16 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 17 Dec 2010 07:24:16 -0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References: Message-ID: ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes set to true and it will not download chemical components if loadChemComp is false. The drawback is that the data representation will not be as precise. Andreas On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell wrote: > The SeqRes to Atom record alignment forces the use of chemical > components to translate non-standard residues to their closest standard > counterpart for the sequence alignment. ?I have to disable > setLoadChemCompInfo and setAlignSeqRes when I don't want to download > chemical component files from RCSB when parsing a PDB file. > > Regards, > Steve > > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico > Sent: Wednesday, December 15, 2010 8:46 PM > To: Biojava-l at lists.open-bio.org > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB > file > > Hi, dear all: > > I use biojava3 beta1 to parse the PDB files recently, my program is: > > ? ? ? ? ? ?PDBFileReader pdbreader = new PDBFileReader(); > ? ? ? ? ? ?pdbreader.setAutoFetch(false); > ? ? ? ? ? ?pdbreader.setPath(pdbDirPath); > > ? ? ? ? ? ?FileParsingParameters params = new FileParsingParameters(); > ? ? ? ? ? ?params.setLoadChemCompInfo(*false*); > ? ? ? ? ? ?params.setHeaderOnly(*false*); > ? ? ? ? ? ?params.setAlignSeqRes(*true*); > ? ? ? ? ? ?params.setParseSecStruc(*false*); > ? ? ? ? ? ?pdbreader.setFileParsingParameters(params); > > ? ? ? ? ? ?Structure structure = null; > ? ? ? ? ? ?try { > ? ? ? ? ? ? ? ?structure = pdbreader.getStructure(pdbDirPath + "\\" + > file); > ? ? ? ? ? ?} catch (IOException e) { > ? ? ? ? ? ? ? ?e.printStackTrace(); > ? ? ? ? ? ?} > > when I execute this program, it will download something such as: > > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* > > but I do not want to lownload those stuff, How can I cancel it? > Thanks. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From dasarnow at gmail.com Sun Dec 19 03:36:28 2010 From: dasarnow at gmail.com (Daniel Asarnow) Date: Sat, 18 Dec 2010 19:36:28 -0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References:

Message-ID: Prior to this, did setting loadChemComp true add processing overhead if setAlignSeqRes is also true? I.e. what's the difference between AlignSeqRes with and without loadChemComp? Just want to know what the right flags are when one wants accurate SEQRES <---> ATOM alignments but isn't otherwise using the components info... On a related note, I ended up writing a class that loaded and discarded Structure objects for my PDBs, to trigger all the downloads before my big processing jobs. Though I guess the right (non-lazy) thing to do is parse out the combined components library to the individual files. -da On Fri, Dec 17, 2010 at 07:24, Andreas Prlic wrote: > ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes > set to true and it will not download chemical components if > loadChemComp is false. The drawback is that the data representation > will not be as precise. > > Andreas > > > > On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell > wrote: > > The SeqRes to Atom record alignment forces the use of chemical > > components to translate non-standard residues to their closest standard > > counterpart for the sequence alignment. I have to disable > > setLoadChemCompInfo and setAlignSeqRes when I don't want to download > > chemical component files from RCSB when parsing a PDB file. > > > > Regards, > > Steve > > > > -----Original Message----- > > From: biojava-l-bounces at lists.open-bio.org > > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico > > Sent: Wednesday, December 15, 2010 8:46 PM > > To: Biojava-l at lists.open-bio.org > > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB > > file > > > > Hi, dear all: > > > > I use biojava3 beta1 to parse the PDB files recently, my program is: > > > > PDBFileReader pdbreader = new PDBFileReader(); > > pdbreader.setAutoFetch(false); > > pdbreader.setPath(pdbDirPath); > > > > FileParsingParameters params = new FileParsingParameters(); > > params.setLoadChemCompInfo(*false*); > > params.setHeaderOnly(*false*); > > params.setAlignSeqRes(*true*); > > params.setParseSecStruc(*false*); > > pdbreader.setFileParsingParameters(params); > > > > Structure structure = null; > > try { > > structure = pdbreader.getStructure(pdbDirPath + "\\" + > > file); > > } catch (IOException e) { > > e.printStackTrace(); > > } > > > > when I execute this program, it will download something such as: > > > > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp > > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif > > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* > > > > but I do not want to lownload those stuff, How can I cancel it? > > Thanks. > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From dasarnow at gmail.com Sun Dec 19 03:47:27 2010 From: dasarnow at gmail.com (Daniel Asarnow) Date: Sat, 18 Dec 2010 19:47:27 -0800 Subject: [Biojava-l] Thread safety of the AtomCache Message-ID: Is AtomCache expected to be thread-safe? It would be nice to have multiple threads work off the same AtomCache, when those threads might each ask for some of the same PDB IDs. Best, -da From andreas at sdsc.edu Sun Dec 19 08:19:40 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 19 Dec 2010 00:19:40 -0800 Subject: [Biojava-l] Thread safety of the AtomCache In-Reply-To: References: Message-ID: Hi Daniel, Yes, the AtomCache is thread-safe. Caching across multiple threads was one of the ideas behind this class... Andreas On Sat, Dec 18, 2010 at 7:47 PM, Daniel Asarnow wrote: > Is AtomCache expected to be thread-safe? ?It would be nice to have multiple > threads work off the same AtomCache, when those threads might each ask for > some of the same PDB IDs. > > Best, > > -da > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From andreas at sdsc.edu Sun Dec 19 08:38:08 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 19 Dec 2010 00:38:08 -0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References:

Message-ID: Hi Daniel, The chemical components provide the chemically correct definition of the various groups. There are quite a few chemically modified amino acids in PDB files which can be represented as amino acids, rather than Hetatom groups, based on these definitions. This has an impact on the sequence alignment that is done during the alignSeqRes process. Without the correct representations, those groups would be flagged as "X", or might be missing. Will set up a wiki page which explains all the parsing options in detail. About your 2nd comment. There is now a new ChemCompProvider interface. Perhaps there should be a new implementation of this, which is downloading the file that contains all chemcomps bundled and provide the data from there... Andreas On Sat, Dec 18, 2010 at 7:36 PM, Daniel Asarnow wrote: > Prior to this, did setting loadChemComp true add processing overhead if > setAlignSeqRes is also true? ?I.e. what's the difference between AlignSeqRes > with and without loadChemComp? > Just want to know what the right flags are when one wants accurate SEQRES > <---> ATOM alignments but isn't otherwise using the components info... > On a related note, I ended up writing a class that loaded and discarded > Structure objects for my PDBs, to trigger all the downloads before my big > processing jobs. ?Though I guess the right (non-lazy) thing to do is parse > out the combined components library to the individual files. > -da > > On Fri, Dec 17, 2010 at 07:24, Andreas Prlic wrote: >> >> ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes >> set to true and it will not download chemical components if >> loadChemComp is false. The drawback is that the data representation >> will not be as precise. >> >> Andreas >> >> >> >> On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell >> wrote: >> > The SeqRes to Atom record alignment forces the use of chemical >> > components to translate non-standard residues to their closest standard >> > counterpart for the sequence alignment. ?I have to disable >> > setLoadChemCompInfo and setAlignSeqRes when I don't want to download >> > chemical component files from RCSB when parsing a PDB file. >> > >> > Regards, >> > Steve >> > >> > -----Original Message----- >> > From: biojava-l-bounces at lists.open-bio.org >> > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico >> > Sent: Wednesday, December 15, 2010 8:46 PM >> > To: Biojava-l at lists.open-bio.org >> > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB >> > file >> > >> > Hi, dear all: >> > >> > I use biojava3 beta1 to parse the PDB files recently, my program is: >> > >> > ? ? ? ? ? ?PDBFileReader pdbreader = new PDBFileReader(); >> > ? ? ? ? ? ?pdbreader.setAutoFetch(false); >> > ? ? ? ? ? ?pdbreader.setPath(pdbDirPath); >> > >> > ? ? ? ? ? ?FileParsingParameters params = new FileParsingParameters(); >> > ? ? ? ? ? ?params.setLoadChemCompInfo(*false*); >> > ? ? ? ? ? ?params.setHeaderOnly(*false*); >> > ? ? ? ? ? ?params.setAlignSeqRes(*true*); >> > ? ? ? ? ? ?params.setParseSecStruc(*false*); >> > ? ? ? ? ? ?pdbreader.setFileParsingParameters(params); >> > >> > ? ? ? ? ? ?Structure structure = null; >> > ? ? ? ? ? ?try { >> > ? ? ? ? ? ? ? ?structure = pdbreader.getStructure(pdbDirPath + "\\" + >> > file); >> > ? ? ? ? ? ?} catch (IOException e) { >> > ? ? ? ? ? ? ? ?e.printStackTrace(); >> > ? ? ? ? ? ?} >> > >> > when I execute this program, it will download something such as: >> > >> > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp >> > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif >> > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* >> > >> > but I do not want to lownload those stuff, How can I cancel it? >> > Thanks. >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> >> >> -- From andreas at sdsc.edu Sun Dec 19 14:29:13 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 19 Dec 2010 06:29:13 -0800 Subject: [Biojava-l] biojava3.0-beta4 Message-ID: The fourth beta version for BioJava 3.0 has been released in our Maven repository http://biojava.org/download/maven/ New things: - tons of javadoc improvements - a patch in the protmod module - bug fix regarding automated download of chemcomp files in the structure module A new set of javadoc files is available from: http://www.biojava.org/docs/api_latest/ Andreas From wuuter at gmail.com Mon Dec 20 05:31:24 2010 From: wuuter at gmail.com (Fico) Date: Mon, 20 Dec 2010 13:31:24 +0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References:

Message-ID: now the question of ChemComp download is OK, but I found a new question when I test bioJava3-Beta4, my program fragment: FileParsingParameters params = new FileParsingParameters(); params.setLoadChemCompInfo(false); params.setHeaderOnly(false); // params.setParseCAOnly(true); params.setAlignSeqRes(true); params.setParseSecStruc(false); // loop file for (String file : getPdbFiles()) { PDBFileReader pdbreader = new PDBFileReader(); pdbreader.setAutoFetch(false); pdbreader.setPath(getPdbDir()); pdbreader.setFileParsingParameters(params); // pdbreader.setLoadChemCompInfo(true); Structure struc = null; try { struc = pdbreader.getStructure(getPdbDir() + "\\" + file); } catch (IOException e) { e.printStackTrace(); } String pdbid = struc.getPDBCode(); for (int i = 0; i < struc.nrModels(); i++) { // loop chain for (Chain ch : struc.getModel(i)) { System.out.println(pdbid + ">>>" + ch.getChainID() + ">>>" + ch.getAtomSequence()); System.out.println(pdbid + ">>>" + ch.getChainID() + ">>>" + ch.getSeqResSequence()); // Test the getAtomGroups() and getSeqResGroups() method // List group = ch.getAtomGroups(); List group = ch.getSeqResGroups(); for (Group gp : group) { System.out.println(gp.getResidueNumber() + ":" + gp.getPDBName()); } } } } my test PDB file is 1O1G.pdb, there are 45 modified residues in chain A, when I use .getAtomGroups() I can get all residues' atom information, such as ResidueNumber and PDBName: 797:PHE 798:LEU 799:MET 800:ARG 801:VAL 802:GLU ...... 840:PRO 841:LEU 842:LEU 843:LYS but use .getSeqResGroups(), the last 45 residues will miss some information, such as ResidueNumber and atom coordinate, the output of the program is: 797:PHE 798:LEU null:MET null:ARG null:VAL null:GLU ...... null:PRO null:LEU null:LEU null:LYS In biojava3-Beta1 the two method produce same result just as .getAtomGroups() in Beta4. so is it a bug? P.S. Could we add new method to get all amino acid sequence with modifed residues directly? now both getAtomSequence() and getSeqResSequence() can't do this, if I want get the amino acid sequence with modifed residues, I had to use .getAtomGroups() or .getSeqResGroups() first and then loop each residue to get one letter amino acid sequence. 2010/12/17 Andreas Prlic > ok that behavior is fixed in SVN now. Now you can have setAlignSeqRes > set to true and it will not download chemical components if > loadChemComp is false. The drawback is that the data representation > will not be as precise. > > Andreas > > > > On Thu, Dec 16, 2010 at 8:26 AM, Steve Darnell > wrote: > > The SeqRes to Atom record alignment forces the use of chemical > > components to translate non-standard residues to their closest standard > > counterpart for the sequence alignment. I have to disable > > setLoadChemCompInfo and setAlignSeqRes when I don't want to download > > chemical component files from RCSB when parsing a PDB file. > > > > Regards, > > Steve > > > > -----Original Message----- > > From: biojava-l-bounces at lists.open-bio.org > > [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Fico > > Sent: Wednesday, December 15, 2010 8:46 PM > > To: Biojava-l at lists.open-bio.org > > Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB > > file > > > > Hi, dear all: > > > > I use biojava3 beta1 to parse the PDB files recently, my program is: > > > > PDBFileReader pdbreader = new PDBFileReader(); > > pdbreader.setAutoFetch(false); > > pdbreader.setPath(pdbDirPath); > > > > FileParsingParameters params = new FileParsingParameters(); > > params.setLoadChemCompInfo(*false*); > > params.setHeaderOnly(*false*); > > params.setAlignSeqRes(*true*); > > params.setParseSecStruc(*false*); > > pdbreader.setFileParsingParameters(params); > > > > Structure structure = null; > > try { > > structure = pdbreader.getStructure(pdbDirPath + "\\" + > > file); > > } catch (IOException e) { > > e.printStackTrace(); > > } > > > > when I execute this program, it will download something such as: > > > > *creating directory D:\MyWorkspace\TestFiles\pdbFiles\chemcomp > > downloading http://www.rcsb.org/pdb/files/ligand/35G.cif > > downloading http://www.rcsb.org/pdb/files/ligand/GDP.cif* > > > > but I do not want to lownload those stuff, How can I cancel it? > > Thanks. > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > From andreas at sdsc.edu Tue Dec 21 23:55:39 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 21 Dec 2010 15:55:39 -0800 Subject: [Biojava-l] how to cancel download chemcomp when parser a PDB file In-Reply-To: References: