[Biojava-dev] DownloadChemCompProvider exception

Andreas Prlic andreas at sdsc.edu
Tue Jan 25 16:58:24 UTC 2011


Hi Steve,


> 1. Rename the files _CON.cif.gz: Chrome and 7-Zip automatically append an underscore to the beginning of reserved filenames.  Firefox and Internet Explorer only prompt the error.  I have not found a style guideline for this condition; however, I would follow the existing "precedence" (even as limited as it is).

ok changed in SVN.

> 2. Add 'AUX' and 'NUL' to the list of protected IDs to avoid potential namespace clashes in the future


Got added


> I have one additional comment based on this snippet of code:

I am trying to simplify this... You can now do the following to get a
local install of all chemical components:

		DownloadChemCompProvider.setPath("/path/to/local/PDB_DIR/");
		DownloadChemCompProvider c = new DownloadChemCompProvider();
		c.setDownloadAll(true);
		c.checkDoFirstInstall();

The path is stored as a System property "PDB_DIR", which is used by
all classes that need to access local PDB or chem comp files.. (chem
comp files are always expected to be in PDB_DIR/chemcomp )

Andreas




>
> import org.biojava.bio.structure.io.PDBFileReader;
> import org.biojava.bio.structure.io.mmcif.DownloadChemCompProvider;
>
> public class ChemCompDistribution {
>
>        public static void main( String[] args ) {
>                PDBFileReader r = new PDBFileReader();
>                r.setPath("C:\\Users\\darnells\\Desktop\\Chemical Component Dictionary\\");
>                DownloadChemCompProvider.setPath("C:\\Users\\darnells\\Desktop\\Chemical Component Dictionary\\");
>                DownloadChemCompProvider c = new DownloadChemCompProvider();
>                c.setDownloadAll(true);
>                c.checkDoFirstInstall();
>        }
> }
>
> When using DownloadChemCompProvider.checkDoFirstInstall(), you end up using AllChemCompProvider.checkPath() and never call DownloadChemCompProvider.checkPath() (leaving DownloadChemCompProvider.path unset).  I got into a situation where I have to set the PDBFileReader path to download components.cif.gz and then set the DownloadChemCompProvider path separately to split it (never gets set otherwise).
>
> I understand the reuse of AllChemCompProvider.downloadFile() influenced this implementation, but I think this problem should be resolved.  I suggest that setting the path in either PDBFileReader or DownloadChemCompProvider should lead to success.
>
> Thanks again,
> Steve
>
> -----Original Message-----
> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic
> Sent: Monday, January 24, 2011 7:22 PM
> To: Steve Darnell
> Cc: biojava-dev at lists.open-bio.org
> Subject: Re: [Biojava-dev] DownloadChemCompProvider exception
>
> Hi Steve,
>
> thanks for spotting this. Would it help to rename the local files to
> something like CON_1.cif.gz ? I just committed a check which is
> mapping the ids to different filenames.
>
> Andreas
>
>
> On Mon, Jan 24, 2011 at 3:41 PM, Steve Darnell <darnells at dnastar.com> wrote:
>> Greetings,
>>
>> I get the following exception when using the DownloadChemCompProvider on
>> Windows (XP/Vista/7):
>>
>> Installing individual chem comp files ...
>> java.io.FileNotFoundException: C:\Users\darnells\Desktop\Chemical
>> Component Dictionary\chemcomp\CON.cif.gz (The handle is invalid)
>>        at java.io.FileOutputStream.open(Native Method)
>>        at java.io.FileOutputStream.<init>(Unknown Source)
>>        at java.io.FileOutputStream.<init>(Unknown Source)
>>        at
>> org.biojava.bio.structure.io.mmcif.DownloadChemCompProvider.writeID(Down
>> loadChemCompProvider.java:159)
>>        at
>> org.biojava.bio.structure.io.mmcif.DownloadChemCompProvider.split(Downlo
>> adChemCompProvider.java:130)
>>        at
>> org.biojava.bio.structure.io.mmcif.DownloadChemCompProvider.checkDoFirst
>> Install(DownloadChemCompProvider.java:99)
>>        at
>> steve.sandbox.playground.ChemCompDistribution.main(ChemCompDistribution.
>> java:14)
>> created 3990 chemical component files.
>>
>>
>> The problem is that CON.cif.gz and PRN.cif.gz use the reserved Windows
>> filenames 'CON' and 'PRN'
>> (http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words).
>> These names are reserved regardless of the file extension.  I have
>> spoken with RCSB in the past.  They are unable to change the code for
>> these records (and the five affected PDB files: 2I7S, 1TUM, 1U1N, 1U1L,
>> and 1CL8).
>>
>> At minimum, DownloadChemCompProvider should catch the exception and
>> proceed with the remaining records.  This seems reasonable given the low
>> number of affected PDB files.
>>
>> Regards,
>> Steve
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>



-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------




More information about the biojava-dev mailing list