Downloading the dataset¶
The dataset is published as a dataset.tar.gz asset attached to each
tagged release of the FLARE GitHub repository.
There are three common ways to download it.
Tip
You can browse the dataset contents in Problems or GitHub without needing to download the dataset.
Python (recommended)¶
formulation-bench ships a download_dataset helper that fetches the
tarball, extracts it in a local cache directory, and
returns the extracted path. Subsequent calls reuse the cached copy.
from formulation_bench import download_dataset
root = download_dataset() # latest version this package targets
root = download_dataset("dataset-v0.2.0") # download a specific release tag
root = download_dataset(force=True) # re-download, overwriting cache
It is often more convenient to use Dataset.load():
from formulation_bench import Dataset
ds = Dataset.load()
Cache location, in order of precedence:
cache_dir=argument todownload_dataset/Dataset.load.$FORMULATION_BENCH_CACHEenvironment variable.$XDG_CACHE_HOME/formulation_benchif$XDG_CACHE_HOMEis set.~/.cache/formulation_bench.
curl¶
If you just want the dataset files on disk, fetch the tarball directly from the release page:
VERSION=dataset-v0.2.0
curl -L -o dataset.tar.gz \
"https://github.com/henryrobbins/flare/releases/download/${VERSION}/dataset.tar.gz"
mkdir -p formulation-bench && tar -xzf dataset.tar.gz -C formulation-bench
Warning
The archive expands to a top-level dataset/ directory, so running
tar -xzf in a working directory that already contains dataset/ will overwrite it. It is recommended to extract it in a fresh directory (like above).
You can now load the dataset with:
from formulation_bench import Dataset
ds = Dataset("formulation-bench/dataset")
GitHub website¶
Open the FLARE releases page.
Pick a release (e.g.
dataset-v0.2.0).Under Assets, click
dataset.tar.gz.Extract it with your archive tool of choice.