120 lines
4.4 KiB
Plaintext
120 lines
4.4 KiB
Plaintext
---
|
||
title: Caching
|
||
hide_title: true
|
||
sidebar_position: 5
|
||
version: 1
|
||
---
|
||
|
||
## Caching
|
||
|
||
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purpose. Configuring caching is as easy as providing a custom cache config in your
|
||
`superset_config.py` that complies with [the Flask-Caching specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
|
||
Flask-Caching supports various caching backends, including Redis, Memcached, SimpleCache (in-memory), or the
|
||
local filesystem. Custom cache backends are also supported. See [here](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends) for specifics.
|
||
The following cache configurations can be customized:
|
||
- Metadata cache (optional): `CACHE_CONFIG`
|
||
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
|
||
- SQL Lab query results (optional): `RESULTS_BACKEND`. See [Async Queries via Celery](/docs/installation/async-queries-celery) for details
|
||
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
|
||
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
|
||
|
||
Please note, that Dashboard and Explore caching is required. When running Superset in debug mode, both Explore and Dashboard caches will default to `SimpleCache`;
|
||
However, trying to run Superset in non-debug mode without defining a cache for these will cause the application to fail on startup. When running
|
||
superset in single-worker mode, any cache backend is supported. However, when running Superset in on a multi-worker setup, a dedicated cache is required. For this
|
||
we recommend using either Redis or Memcached:
|
||
|
||
- Redis (recommended): we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
|
||
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
|
||
`python-memcached` does not handle storing binary data correctly.
|
||
|
||
Both of these libraries can be installed using pip.
|
||
|
||
For chart data, Superset goes up a “timeout search path”, from a slice's configuration
|
||
to the datasource’s, the database’s, then ultimately falls back to the global default
|
||
defined in `DATA_CACHE_CONFIG`.
|
||
|
||
## Celery beat
|
||
|
||
Superset has a Celery task that will periodically warm up the cache based on different strategies.
|
||
To use it, add the following to the `CELERYBEAT_SCHEDULE` section in `config.py`:
|
||
|
||
```python
|
||
CELERYBEAT_SCHEDULE = {
|
||
'cache-warmup-hourly': {
|
||
'task': 'cache-warmup',
|
||
'schedule': crontab(minute=0, hour='*'), # hourly
|
||
'kwargs': {
|
||
'strategy_name': 'top_n_dashboards',
|
||
'top_n': 5,
|
||
'since': '7 days ago',
|
||
},
|
||
},
|
||
}
|
||
```
|
||
|
||
This will cache all the charts in the top 5 most popular dashboards every hour. For other
|
||
strategies, check the `superset/tasks/cache.py` file.
|
||
|
||
### Caching Thumbnails
|
||
|
||
This is an optional feature that can be turned on by activating it’s feature flag on config:
|
||
|
||
```
|
||
FEATURE_FLAGS = {
|
||
"THUMBNAILS": True,
|
||
"THUMBNAILS_SQLA_LISTENERS": True,
|
||
}
|
||
```
|
||
|
||
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
|
||
and are processed asynchronously by the workers.
|
||
|
||
An example config where images are stored on S3 could be:
|
||
|
||
```python
|
||
from flask import Flask
|
||
from s3cache.s3cache import S3Cache
|
||
|
||
...
|
||
|
||
class CeleryConfig(object):
|
||
BROKER_URL = "redis://localhost:6379/0"
|
||
CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
|
||
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
|
||
CELERYD_PREFETCH_MULTIPLIER = 10
|
||
CELERY_ACKS_LATE = True
|
||
|
||
|
||
CELERY_CONFIG = CeleryConfig
|
||
|
||
def init_thumbnail_cache(app: Flask) -> S3Cache:
|
||
return S3Cache("bucket_name", 'thumbs_cache/')
|
||
|
||
|
||
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
|
||
# Async selenium thumbnail task will use the following user
|
||
THUMBNAIL_SELENIUM_USER = "Admin"
|
||
```
|
||
|
||
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can
|
||
override the base URL for selenium using:
|
||
|
||
```
|
||
WEBDRIVER_BASEURL = "https://superset.company.com"
|
||
```
|
||
|
||
Additional selenium web drive configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
|
||
implement a custom function to authenticate selenium. The default function uses the `flask-login`
|
||
session cookie. Here's an example of a custom function signature:
|
||
|
||
```python
|
||
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
|
||
pass
|
||
```
|
||
|
||
Then on configuration:
|
||
|
||
```
|
||
WEBDRIVER_AUTH_FUNC = auth_driver
|
||
```
|