Apache Superset is a Data Visualization and Data Exploration Platform

Go to file

Maxime Beauchemin 4b11f45f72 Using a NullPool for external connections by default (#4251 ) Currently, even though `get_sqla_engine` calls get memoized, engines are still short lived since they are attached to an models.Database ORM object. All engines created through this method have the scope of a web request. Knowing that the SQLAlchemy objects are short lived means that a related connection pool would also be short lived and mostly useless. I think it's pretty rare that connections get reused within the context of a view or Celery worker task. We've noticed on Redshift that Superset was leaving many connections opened (hundreds). This is probably due to a combination of the current process not garbage collecting connections properly, and perhaps the absence of connection timeout on the redshift side of things. This could also be related to the fact that we experience web requests timeouts (enforced by gunicorn) and that process-killing may not allow SQLAlchemy to clean up connections as they occur (which this PR may not help fixing...) For all these reasons, it seems like the right thing to do to use NullPool for external connection (but not for our connection to the metadata db!). Opening the PR for conversation. Putting this query into our staging today to run some tests.		2018-01-23 15:13:50 -08:00
docs	Fix tutorial doesn't match the current interface #4138 (#4215 )	2018-01-16 21:18:00 -08:00
scripts	[flake8] Resolve I??? errors (#3797 )	2017-11-07 20:23:40 -08:00
superset	Using a NullPool for external connections by default (#4251 )	2018-01-23 15:13:50 -08:00
tests	[cache] Using the query as the basis of the cache key (#4016 )	2018-01-12 12:05:12 -08:00
.codeclimate.yml	[js-testing] type checking for dates.js (#2893 )	2017-06-07 22:27:21 -07:00
.coveralls.yml	[bugfix] fails on None view_menu (#3155 )	2017-07-18 19:42:20 -07:00
.gitignore	Ignore intellij files (#3446 )	2017-09-12 09:04:48 -07:00
.landscape.yml	make stack trace more readable (#1672 )	2016-11-28 21:05:37 -08:00
.pylintrc	DECKGL integration - Phase 1 (#3771 )	2017-11-16 00:30:02 -08:00
.travis.yml	Sort out dependencies in travis/tox (#4186 )	2018-01-10 21:46:10 -08:00
CHANGELOG.md	Changelog for 0.21.0 (#4045 )	2017-12-12 21:29:01 -08:00
CODE_OF_CONDUCT.md	Create CODE_OF_CONDUCT.md (#3991 )	2017-12-02 14:57:54 -08:00
CONTRIBUTING.md	[flake8] Updaing CONTRIBUTING.md (#3862 )	2017-11-14 18:17:53 -08:00
ISSUE_TEMPLATE.md	[WiP] rename project from Caravel to Superset (#1576 )	2016-11-09 23:08:22 -08:00
LICENSE.txt	LICENSE	2015-07-21 20:54:31 +00:00
MANIFEST.in	Fix 3657 (#3739 )	2017-10-30 11:26:42 -07:00
README.md	add Ona as a user (#4234 )	2018-01-18 08:27:56 -08:00
TODO.md	[WiP] rename project from Caravel to Superset (#1576 )	2016-11-09 23:08:22 -08:00
alembic.ini	[WiP] rename project from Caravel to Superset (#1576 )	2016-11-09 23:08:22 -08:00
dev-reqs-for-docs.txt	Splitting dev-reqs.txt into requirements for development and docs (dev-reqs-for-docs.txt). Updating CONTRIBUTING.md accordingly (#2049 )	2017-01-26 09:07:23 -08:00
dev-reqs.txt	Removing dependency on pythrifthiveapi (#3494 )	2017-09-19 11:19:49 -07:00
dump.rdb	Feature: "Impersonate user" setting on Datasource (#3404 )	2017-09-18 09:52:29 -07:00
gen_changelog.sh	CHANGELOG for 0.20.0 (#3545 )	2017-09-28 14:42:57 -07:00
pylint-errors.sh	pylint errors will now break the build (#2543 )	2017-04-03 21:53:06 -07:00
pypi_push.sh	Fixing pypi_push.sh	2017-01-24 11:42:49 -08:00
run_specific_test.sh	[WiP] rename project from Caravel to Superset (#1576 )	2016-11-09 23:08:22 -08:00
run_tests.sh	run_tests.sh: call coveralls only on CI (#3836 )	2017-11-11 21:44:55 -08:00
setup.cfg	[flake8] Enabling flake8 linting (#3776 )	2017-11-04 22:50:42 -07:00
setup.py	Bump flower==0.9.2 (#4263 )	2018-01-23 10:15:10 -08:00
tox.ini	Sort out dependencies in travis/tox (#4186 )	2018-01-10 21:46:10 -08:00

README.md

Superset

Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application

[this project used to be named Caravel, and Panoramix in the past]

Screenshots & Gifs

View Dashboards

View/Edit a Slice

Query and Visualize with SQL Lab

Apache Superset

Apache Superset is a data exploration and visualization web application.

Superset provides:

An intuitive interface to explore and visualize datasets, and create interactive dashboards.
A wide array of beautiful visualizations to showcase your data.
Easy, code-free, user flows to drill down and slice and dice the data underlying exposed dashboards. The dashboards and charts acts as a starting point for deeper analysis.
A state of the art SQL editor/IDE exposing a rich metadata browser, and an easy workflow to create visualizations out of any result set.
An extensible, high granularity security model allowing intricate rules on who can access which product features and datasets. Integration with major authentication backends (database, OpenID, LDAP, OAuth, REMOTE_USER, ...)
A lightweight semantic layer, allowing to control how data sources are exposed to the user by defining dimensions and metrics
Out of the box support for most SQL-speaking databases
Deep integration with Druid allows for Superset to stay blazing fast while slicing and dicing large, realtime datasets
Fast loading dashboards with configurable caching

Database Support

Superset speaks many SQL dialects through SQLAlchemy, a Python ORM that is compatible with most common databases.

Superset can be used to visualize data out of most databases:

MySQL
Postgres
Vertica
Oracle
Microsoft SQL Server
SQLite
Greenplum
Firebird
MariaDB
Sybase
IBM DB2
Exasol
MonetDB
Snowflake
Redshift
more! look for the availability of a SQLAlchemy dialect for your database to find out whether it will work with Superset

Druid!

On top of having the ability to query your relational databases, Superset has ships with deep integration with Druid (a real time distributed column-store). When querying Druid, Superset can query humongous amounts of data on top of real time dataset. Note that Superset does not require Druid in any way to function, it's simply another database backend that it can query.

Here's a description of Druid from the http://druid.io website:

Druid is an open-source analytics data store designed for business intelligence (OLAP) queries on event data. Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation. Existing Druid deployments have scaled to trillions of events and petabytes of data. Druid is best used to power analytic dashboards and applications.