[Open-bio-l] Common Sample Data Collection, was: SCF files (Staden)
Peter Rice
pmr at ebi.ac.uk
Wed Nov 30 11:38:30 UTC 2011
On 11/30/2011 11:32 AM, Pjotr Prins wrote:
> Git is not very good for storing large data files, which we would want
> to fetch partially. My suggestion would be to have a plain old file
> repo, e.g. on S3, which can be mirrored by others.
We had issues with large files in the EMBOSS release, and make those
available via rsync to add to the developers CVS checkout. They include
the NCBI taxonomy source and index files and the ontology source and
index files.
The next EMBOSS release will include http and ftp URLs as valid inputs
for any data type, so EMBOSS could use remote files for format tests. I'
look into how other repositories could be added.
I had to add some extra qualifiers to allow queries and offsets to be
specified, and rewrote the query language parsing to merge very similar
code segments.
regards,
Peter Rice
EMBOSS Team
More information about the Open-Bio-l
mailing list