* Add ability to inject statsd client; some py test/reqs updates
- Updated the metrics logger to allow construction with an existing
statsd client, so that it can be configured by external systems or libs.
- added requirements to requirements-dev.txt which are needed to run
tests-eg coverage, nose
- removed dependency on mock lib, it is in python stdlib now
- updated tox.ini to remove the now-superfluous deps
* add license to test file, and remove blank line at EOF
* Bump sqla to >=1.3.1
* Refine mssql column types to only use N-prefixing when necessary
* make join explicit
* replace set with list
* Add additional test case for N-prefix
* Replace engine with dialect and fix linting error
* Remove unneeded import
* Merge dataframe and column name mutation logic, add flag for disabling column aliases and add column name length checking
* Remove custome mutate_label from oracle spec
* Move hashing from mutate_label() to make_label_compatible()
* Remove empty line
* Make label mutating and truncating more robust
* Rename variables and make proposed changes from review
* Always execute labels_expected codepath
* Fix linting error
* Add comments and fix subquery errors
* Refine column compatibility
* Simplify label assignment
* Add unit tests for BQ and Oracle
* Linting
* Add interim grains
* Refactor and add blacklist
* Change PT30M to PT0.5H
* Linting
* Linting
* Add time grain addons to config.py and refactor engine spec logic
* Remove redundant import and clean up config.py
* Fix bad rebase
* Implement changes proposed by @betodealmeida
* Revert removal of name from Grain
* Linting
* [sql lab] extract Hive error messages
So pyhive returns an exception object with a stringified thrift error
object. This PR uses a regex to extract the errorMessage portion of that
string.
* Unit test
* Improve database type inference
Python's DBAPI isn't super clear and homogeneous on the
cursor.description specification, and this PR attempts to improve
inferring the datatypes returned in the cursor.
This work started around Presto's TIMESTAMP type being mishandled as
string as the database driver (pyhive) returns it as a string. The work
here fixes this bug and does a better job at inferring MySQL and Presto types.
It also creates a new method in db_engine_specs allowing for other
databases engines to implement and become more precise on type-inference
as needed.
* Fixing tests
* Adressing comments
* Using infer_objects
* Removing faulty line
* Addressing PrestoSpec redundant method comment
* Fix rebase issue
* Fix tests
* [sql lab] a better approach at limiting queries
Currently there are two mechanisms that we use to enforce the row
limiting constraints, depending on the database engine:
1. use dbapi's `cursor.fetchmany()`
2. wrap the SQL into a limiting subquery
Method 1 isn't great as it can result in the database server storing
larger than required result sets in memory expecting another fetch
command while we know we don't need that.
Method 2 has a positive side of working with all database engines,
whether they use LIMIT, ROWNUM, TOP or whatever else since sqlalchemy
does the work as specified for the dialect. On the downside though
the query optimizer might not be able to optimize this as much as an
approach that doesn't use a subquery.
Since most modern DBs use the LIMIT syntax, this adds a regex approach
to modify the query and force a LIMIT clause without using a subquery
for the database that support this syntax and uses method 2 for all
others.
* Fixing build
* Fix lint
* Added more tests
* Fix tests
* [sqllab] improve Hive support
* Fix "Transport not open" bug
* Getting progress bar to show
* Bump pyhive to 0.4.0
* Getting [Track Job] button to show
* Fix testzz