438 lines
15 KiB
Plaintext
438 lines
15 KiB
Plaintext
---
|
|
title: Kubernetes
|
|
hide_title: true
|
|
sidebar_position: 1
|
|
version: 1
|
|
---
|
|
|
|
## Installing on Kubernetes
|
|
|
|
Running Superset on Kubernetes is supported with the provided [Helm](https://helm.sh/) chart
|
|
found in the official [Superset helm repository](https://apache.github.io/superset/index.yaml).
|
|
|
|
### Prerequisites
|
|
|
|
- A Kubernetes cluster
|
|
- Helm installed
|
|
|
|
:::note
|
|
For simpler, single host environments, we recommend using
|
|
[minikube](https://minikube.sigs.k8s.io/docs/start/) which is easy to setup on many platforms
|
|
and works fantastically well with the Helm chart referenced here.
|
|
:::
|
|
|
|
|
|
### Running
|
|
|
|
1. Add the Superset helm repository
|
|
|
|
```sh
|
|
helm repo add superset https://apache.github.io/superset
|
|
"superset" has been added to your repositories
|
|
```
|
|
|
|
2. View charts in repo
|
|
|
|
```sh
|
|
helm search repo superset
|
|
NAME CHART VERSION APP VERSION DESCRIPTION
|
|
superset/superset 0.1.1 1.0 Apache Superset is a modern, enterprise-ready b...
|
|
```
|
|
|
|
3. Configure your setting overrides
|
|
|
|
Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on:
|
|
|
|
- [bitnami/redis](https://artifacthub.io/packages/helm/bitnami/redis)
|
|
- [bitnami/postgresql](https://artifacthub.io/packages/helm/bitnami/postgresql)
|
|
|
|
More info down below on some important overrides you might need.
|
|
|
|
4. Install and run
|
|
|
|
```sh
|
|
helm upgrade --install --values my-values.yaml superset superset/superset
|
|
```
|
|
|
|
You should see various pods popping up, such as:
|
|
|
|
```sh
|
|
kubectl get pods
|
|
NAME READY STATUS RESTARTS AGE
|
|
superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s
|
|
superset-f5c9c667-dw9lp 1/1 Running 0 4m7s
|
|
superset-f5c9c667-fk8bk 1/1 Running 0 4m11s
|
|
superset-init-db-zlm9z 0/1 Completed 0 111s
|
|
superset-postgresql-0 1/1 Running 0 6d20h
|
|
superset-redis-master-0 1/1 Running 0 6d20h
|
|
superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s
|
|
superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s
|
|
```
|
|
|
|
The exact list will depend on some of your specific configuration overrides but you should generally expect:
|
|
|
|
- N `superset-xxxx-yyyy` and `superset-worker-xxxx-yyyy` pods (depending on your `supersetNode.replicaCount` and `supersetWorker.replicaCount` values)
|
|
- 1 `superset-postgresql-0` depending on your postgres settings
|
|
- 1 `superset-redis-master-0` depending on your redis settings
|
|
- 1 `superset-celerybeat-xxxx-yyyy` pod if you have `supersetCeleryBeat.enabled = true` in your values overrides
|
|
|
|
1. Access it
|
|
|
|
The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either:
|
|
|
|
- Configure the Service as a `LoadBalancer` or `NodePort`
|
|
- Set up an `Ingress` for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...)
|
|
- Run `kubectl port-forward superset-xxxx-yyyy :8088` to directly tunnel one pod's port into your localhost
|
|
|
|
Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with:
|
|
|
|
- user: `admin`
|
|
- password: `admin`
|
|
|
|
### Important settings
|
|
|
|
#### Security settings
|
|
|
|
Default security settings and passwords are included but you **MUST** update them to run `prod` instances, in particular:
|
|
|
|
```yaml
|
|
postgresql:
|
|
postgresqlPassword: superset
|
|
```
|
|
|
|
Make sure, you set a unique strong complex alphanumeric string for your SECRET_KEY and use a tool to help you generate
|
|
a sufficiently random sequence.
|
|
|
|
- To generate a good key you can run, `openssl rand -base64 42`
|
|
|
|
```yaml
|
|
configOverrides:
|
|
secret: |
|
|
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
|
|
```
|
|
|
|
If you want to change the previous secret key then you should rotate the keys.
|
|
Default secret key for kubernetes deployment is `thisISaSECRET_1234`
|
|
|
|
```yaml
|
|
configOverrides:
|
|
my_override: |
|
|
PREVIOUS_SECRET_KEY = 'YOUR_PREVIOUS_SECRET_KEY'
|
|
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
|
|
init:
|
|
command:
|
|
- /bin/sh
|
|
- -c
|
|
- |
|
|
. {{ .Values.configMountPath }}/superset_bootstrap.sh
|
|
superset re-encrypt-secrets
|
|
. {{ .Values.configMountPath }}/superset_init.sh
|
|
```
|
|
|
|
:::note
|
|
Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telemetry data. Knowing the installation counts for different Superset versions informs the project's decisions about patching and long-term support. Scarf purges personally identifiable information (PII) and provides only aggregated statistics.
|
|
|
|
To opt-out of this data collection in your Helm-based installation, edit the `repository:` line in your `helm/superset/values.yaml` file, replacing `apachesuperset.docker.scarf.sh/apache/superset` with `apache/superset` to pull the image directly from Docker Hub.
|
|
:::
|
|
|
|
#### Dependencies
|
|
|
|
Install additional packages and do any other bootstrap configuration in the bootstrap script.
|
|
For production clusters it's recommended to build own image with this step done in CI.
|
|
|
|
:::note
|
|
|
|
Superset requires a Python DB-API database driver and a SQLAlchemy
|
|
dialect to be installed for each datastore you want to connect to.
|
|
|
|
See [Install Database Drivers](/docs/databases/installing-database-drivers) for more information.
|
|
|
|
:::
|
|
|
|
The following example installs the Big Query and Elasticsearch database drivers so that you can
|
|
connect to those datasources in your Superset installation:
|
|
|
|
```yaml
|
|
bootstrapScript: |
|
|
#!/bin/bash
|
|
pip install psycopg2==2.9.6 \
|
|
sqlalchemy-bigquery==1.6.1 \
|
|
elasticsearch-dbapi==0.2.5 &&\
|
|
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
|
|
```
|
|
|
|
#### superset_config.py
|
|
|
|
The default `superset_config.py` is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in `configOverrides`, e.g.:
|
|
|
|
```yaml
|
|
configOverrides:
|
|
my_override: |
|
|
# This will make sure the redirect_uri is properly computed, even with SSL offloading
|
|
ENABLE_PROXY_FIX = True
|
|
FEATURE_FLAGS = {
|
|
"DYNAMIC_PLUGINS": True
|
|
}
|
|
```
|
|
|
|
Those will be evaluated as Helm templates and therefore will be able to reference other `values.yaml` variables e.g. `{{ .Values.ingress.hosts[0] }}` will resolve to your ingress external domain.
|
|
|
|
The entire `superset_config.py` will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that.
|
|
|
|
Full python files can be provided by running `helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py`
|
|
|
|
#### Environment Variables
|
|
|
|
Those can be passed as key/values either with `extraEnv` or `extraSecretEnv` if they're sensitive. They can then be referenced from `superset_config.py` using e.g. `os.environ.get("VAR")`.
|
|
|
|
```yaml
|
|
extraEnv:
|
|
SMTP_HOST: smtp.gmail.com
|
|
SMTP_USER: user@gmail.com
|
|
SMTP_PORT: "587"
|
|
SMTP_MAIL_FROM: user@gmail.com
|
|
|
|
extraSecretEnv:
|
|
SMTP_PASSWORD: xxxx
|
|
|
|
configOverrides:
|
|
smtp: |
|
|
import ast
|
|
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
|
|
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
|
|
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
|
|
SMTP_USER = os.getenv("SMTP_USER","superset")
|
|
SMTP_PORT = os.getenv("SMTP_PORT",25)
|
|
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
|
|
```
|
|
|
|
#### System packages
|
|
|
|
If new system packages are required, they can be installed before application startup by overriding the container's `command`, e.g.:
|
|
|
|
```yaml
|
|
supersetWorker:
|
|
command:
|
|
- /bin/sh
|
|
- -c
|
|
- |
|
|
apt update
|
|
apt install -y somepackage
|
|
apt autoremove -yqq --purge
|
|
apt clean
|
|
|
|
# Run celery worker
|
|
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
|
|
```
|
|
|
|
#### Data sources
|
|
|
|
Data source definitions can be automatically declared by providing key/value yaml definitions in `extraConfigs`:
|
|
|
|
```yaml
|
|
extraConfigs:
|
|
import_datasources.yaml: |
|
|
databases:
|
|
- allow_file_upload: true
|
|
allow_ctas: true
|
|
allow_cvas: true
|
|
database_name: example-db
|
|
extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\
|
|
metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_file_upload\": []\r\n\
|
|
}"
|
|
sqlalchemy_uri: example://example-db.local
|
|
tables: []
|
|
```
|
|
|
|
Those will also be mounted as secrets and can include sensitive parameters.
|
|
|
|
### Configuration Examples
|
|
|
|
#### Setting up OAuth
|
|
|
|
:::note
|
|
|
|
OAuth setup requires that the [authlib](https://authlib.org/) Python library is installed. This can
|
|
be done using `pip` by updating the `bootstrapScript`. See the [Dependencies](#dependencies) section
|
|
for more information.
|
|
|
|
:::
|
|
|
|
```yaml
|
|
extraEnv:
|
|
AUTH_DOMAIN: example.com
|
|
|
|
extraSecretEnv:
|
|
GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com
|
|
GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx
|
|
|
|
configOverrides:
|
|
enable_oauth: |
|
|
# This will make sure the redirect_uri is properly computed, even with SSL offloading
|
|
ENABLE_PROXY_FIX = True
|
|
|
|
from flask_appbuilder.security.manager import AUTH_OAUTH
|
|
AUTH_TYPE = AUTH_OAUTH
|
|
OAUTH_PROVIDERS = [
|
|
{
|
|
"name": "google",
|
|
"icon": "fa-google",
|
|
"token_key": "access_token",
|
|
"remote_app": {
|
|
"client_id": os.getenv("GOOGLE_KEY"),
|
|
"client_secret": os.getenv("GOOGLE_SECRET"),
|
|
"api_base_url": "https://www.googleapis.com/oauth2/v2/",
|
|
"client_kwargs": {"scope": "email profile"},
|
|
"request_token_url": None,
|
|
"access_token_url": "https://accounts.google.com/o/oauth2/token",
|
|
"authorize_url": "https://accounts.google.com/o/oauth2/auth",
|
|
"authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")}
|
|
},
|
|
}
|
|
]
|
|
|
|
# Map Authlib roles to superset roles
|
|
AUTH_ROLE_ADMIN = 'Admin'
|
|
AUTH_ROLE_PUBLIC = 'Public'
|
|
|
|
# Will allow user self registration, allowing to create Flask users from Authorized User
|
|
AUTH_USER_REGISTRATION = True
|
|
|
|
# The default user self registration role
|
|
AUTH_USER_REGISTRATION_ROLE = "Admin"
|
|
```
|
|
|
|
#### Enable Alerts and Reports
|
|
|
|
For this, as per the [Alerts and Reports doc](/docs/configuration/alerts-reports), you will need to:
|
|
|
|
##### Install a supported webdriver in the Celery worker
|
|
|
|
This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the `command`. Here's a working example for `chromedriver`:
|
|
|
|
```yaml
|
|
supersetWorker:
|
|
command:
|
|
- /bin/sh
|
|
- -c
|
|
- |
|
|
# Install chrome webdriver
|
|
# See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx
|
|
apt-get update
|
|
apt-get install -y wget
|
|
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
|
|
apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb
|
|
wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip
|
|
apt-get install -y zip
|
|
unzip chromedriver_linux64.zip
|
|
chmod +x chromedriver
|
|
mv chromedriver /usr/bin
|
|
apt-get autoremove -yqq --purge
|
|
apt-get clean
|
|
rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip
|
|
|
|
# Run
|
|
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
|
|
```
|
|
|
|
##### Run the Celery beat
|
|
|
|
This pod will trigger the scheduled tasks configured in the alerts and reports UI section:
|
|
|
|
```yaml
|
|
supersetCeleryBeat:
|
|
enabled: true
|
|
```
|
|
|
|
##### Configure the appropriate Celery jobs and SMTP/Slack settings
|
|
|
|
```yaml
|
|
extraEnv:
|
|
SMTP_HOST: smtp.gmail.com
|
|
SMTP_USER: user@gmail.com
|
|
SMTP_PORT: "587"
|
|
SMTP_MAIL_FROM: user@gmail.com
|
|
|
|
extraSecretEnv:
|
|
SLACK_API_TOKEN: xoxb-xxxx-yyyy
|
|
SMTP_PASSWORD: xxxx-yyyy
|
|
|
|
configOverrides:
|
|
feature_flags: |
|
|
import ast
|
|
|
|
FEATURE_FLAGS = {
|
|
"ALERT_REPORTS": True
|
|
}
|
|
|
|
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
|
|
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
|
|
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
|
|
SMTP_USER = os.getenv("SMTP_USER","superset")
|
|
SMTP_PORT = os.getenv("SMTP_PORT",25)
|
|
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
|
|
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com")
|
|
|
|
SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None)
|
|
celery_conf: |
|
|
from celery.schedules import crontab
|
|
|
|
class CeleryConfig:
|
|
broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
|
|
imports = (
|
|
"superset.sql_lab",
|
|
"superset.tasks.cache",
|
|
"superset.tasks.scheduler",
|
|
)
|
|
result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
|
|
task_annotations = {
|
|
"sql_lab.get_sql_results": {
|
|
"rate_limit": "100/s",
|
|
},
|
|
}
|
|
beat_schedule = {
|
|
"reports.scheduler": {
|
|
"task": "reports.scheduler",
|
|
"schedule": crontab(minute="*", hour="*"),
|
|
},
|
|
"reports.prune_log": {
|
|
"task": "reports.prune_log",
|
|
'schedule': crontab(minute=0, hour=0),
|
|
},
|
|
'cache-warmup-hourly': {
|
|
"task": "cache-warmup",
|
|
"schedule": crontab(minute="*/30", hour="*"),
|
|
"kwargs": {
|
|
"strategy_name": "top_n_dashboards",
|
|
"top_n": 10,
|
|
"since": "7 days ago",
|
|
},
|
|
}
|
|
}
|
|
|
|
CELERY_CONFIG = CeleryConfig
|
|
reports: |
|
|
EMAIL_PAGE_RENDER_WAIT = 60
|
|
WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/"
|
|
WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/"
|
|
WEBDRIVER_TYPE= "chrome"
|
|
WEBDRIVER_OPTION_ARGS = [
|
|
"--force-device-scale-factor=2.0",
|
|
"--high-dpi-support=2.0",
|
|
"--headless",
|
|
"--disable-gpu",
|
|
"--disable-dev-shm-usage",
|
|
# This is required because our process runs as root (in order to install pip packages)
|
|
"--no-sandbox",
|
|
"--disable-setuid-sandbox",
|
|
"--disable-extensions",
|
|
]
|
|
```
|
|
#### Load the Examples data and dashboards
|
|
If you are trying Superset out and want some data and dashboards to explore, you can load some examples by creating a `my_values.yaml` and deploying it as described above in the **Configure your setting overrides** step of the **Running** section.
|
|
To load the examples, add the following to the `my_values.yaml` file:
|
|
```yaml
|
|
init:
|
|
loadExamples: true
|
|
```
|