superset/docs/docs/installation/cache.mdx

120 lines
4.4 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Caching
hide_title: true
sidebar_position: 5
version: 1
---
## Caching
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purpose. Configuring caching is as easy as providing a custom cache config in your
`superset_config.py` that complies with [the Flask-Caching specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
Flask-Caching supports various caching backends, including Redis, Memcached, SimpleCache (in-memory), or the
local filesystem. Custom cache backends are also supported. See [here](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends) for specifics.
The following cache configurations can be customized:
- Metadata cache (optional): `CACHE_CONFIG`
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
- SQL Lab query results (optional): `RESULTS_BACKEND`. See [Async Queries via Celery](/docs/installation/async-queries-celery) for details
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
Please note, that Dashboard and Explore caching is required. When running Superset in debug mode, both Explore and Dashboard caches will default to `SimpleCache`;
However, trying to run Superset in non-debug mode without defining a cache for these will cause the application to fail on startup. When running
superset in single-worker mode, any cache backend is supported. However, when running Superset in on a multi-worker setup, a dedicated cache is required. For this
we recommend using either Redis or Memcached:
- Redis (recommended): we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
`python-memcached` does not handle storing binary data correctly.
Both of these libraries can be installed using pip.
For chart data, Superset goes up a “timeout search path”, from a slice's configuration
to the datasources, the databases, then ultimately falls back to the global default
defined in `DATA_CACHE_CONFIG`.
## Celery beat
Superset has a Celery task that will periodically warm up the cache based on different strategies.
To use it, add the following to the `CELERYBEAT_SCHEDULE` section in `config.py`:
```python
CELERYBEAT_SCHEDULE = {
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute=0, hour='*'), # hourly
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 5,
'since': '7 days ago',
},
},
}
```
This will cache all the charts in the top 5 most popular dashboards every hour. For other
strategies, check the `superset/tasks/cache.py` file.
### Caching Thumbnails
This is an optional feature that can be turned on by activating its feature flag on config:
```
FEATURE_FLAGS = {
"THUMBNAILS": True,
"THUMBNAILS_SQLA_LISTENERS": True,
}
```
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
and are processed asynchronously by the workers.
An example config where images are stored on S3 could be:
```python
from flask import Flask
from s3cache.s3cache import S3Cache
...
class CeleryConfig(object):
BROKER_URL = "redis://localhost:6379/0"
CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True
CELERY_CONFIG = CeleryConfig
def init_thumbnail_cache(app: Flask) -> S3Cache:
return S3Cache("bucket_name", 'thumbs_cache/')
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
# Async selenium thumbnail task will use the following user
THUMBNAIL_SELENIUM_USER = "Admin"
```
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can
override the base URL for selenium using:
```
WEBDRIVER_BASEURL = "https://superset.company.com"
```
Additional selenium web drive configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
implement a custom function to authenticate selenium. The default function uses the `flask-login`
session cookie. Here's an example of a custom function signature:
```python
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
pass
```
Then on configuration:
```
WEBDRIVER_AUTH_FUNC = auth_driver
```