SolveDB: Integrating Optimization Problem Solvers and Prescriptive Analytics Into SQL Databases

SolveDB is a Database Management Systems (DBMS) with the native support for solvers for different kinds of optimization, constraint satisfaction, time series forecasting, and domain-specific problems. SolveDB aims at making database-based problem specification and solving much more easy, user-friendly, and efficient. To achieve these goals, SolveDB integrates solvers into the DBMS backend, offers in-DBMS processing optimizations, and provides a common language for database queries, problem specification, and user-defined solvers. SolveDB is based on PostgreSQL 9.6.

SolveDB is developed by the following Daisy members: Laurynas Šikšnys, Torben B. Pedersen, and Davide Frazzetto.

SolveDB: Integrating Optimization Problem Solvers

SolveDB comes with a number of pre-installed solvers for linear programming (LP), mixed-integer programming (MIP), global black-box optimization (GO), and domain-specific problems. Users may specify and solve their problems using a single so-called solve query in the following intuitive SQL-based syntax:

  SOLVESELECT col_name [, ...] IN ( select_stmt ) [AS alias]
       [ WITH col_name [, ...] IN ( select_stmt )  AS alias [, ...] ]
[ MINIMIZE ( select_stmt ) [ MAXIMIZE ( select_stmt ) ] |
  MAXIMIZE ( select_stmt ) [ MINIMIZE ( select_stmt ) ] ]
[ SUBJECTTO ( select_stmt ) [, ...] ]
[ USING solver_name [. ...] [( param[:= expr] [, ...] )] ]

Consider the following simple linear optimization problem:

Maximize \[0.6x_1+0.5x_2\]
Subject To \[x_1+2x_2 %5Cleq 1\]
\[3x_1+x_2 %5Cleq 2\]

This optimization problem can be specified in SolveDB as follows:

SOLVESELECT x1, x2 IN (SELECT x1, x2 FROM data) AS u
MAXIMIZE  (SELECT 0.6*x1 + 0.5*x2 FROM u)
SUBJECTTO (SELECT x1+2*x2 <= 1 FROM u),
          (SELECT 3*x1+x2 <= 2 FROM u)
USING solverlp();

This query produces the output relation from the given input relation:

Input relation (data)
x1 x2
Output relation
x1 x2
1 -1

The input relation is specified by the first inner-SELECT. The input relation has an alias u assigned. The alias is used to refer to the same common input relation in all the subsequent SELECTs. The (NULL) values of the specified columns x1 and x2 in the input relation are treated as decision variables. The SELECT in the MAXIMIZE clause specifies the objective function applicable to these variables. All subsequent SELECTs in the SUBJECTTO block specifies the constraints applicable to the decision variables. For solving this problem, the query specifies the use of a so-called view solver solverlp, which generates an (efficient) query plan to produce the output relation from the given input relation, the objective function(-s), and the constraints. During the execution of this query plan, the problem is transparently portioned into a number of sub-problems (if possible) and a so-called physical solver from the GLPK package is invoked for actual computations.

For domain-specific problems, users can develop their own user-defined solvers that rewrite simple SOLVESELECT queries to more complex SOLVESELECT queries, while (re-)using the solving capabilities of other existing view solvers, e.g., solverlp for generic LP/MIP problems. Consider the Sudoku problem, where the goal is to fill the missing Sudoku matrix values in the following input relation:

Sudoku input relation (in_d)
i j val
1 1 2
1 2 NULL
... ... ...
1 9 NULL
... ... ...
9 9 9
Sudoku output relation
i j val
1 1 2
1 2 9
... ... ...
1 9 5
... ... ...
9 9 9

Such a Sudoku constraints respecting output relation can be generated using the following solve query in SolveDB:

SELECT i, j, k AS val FROM(
  SOLVESELECT x IN (SELECT i, j, k, (val=k) AS giv, NULL::integer AS x
                    FROM in_d, generate_series(1,9) AS k) AS in_b
   (SELECT sum(x)=1 FROM in_b GROUP BY i, j),
   (SELECT sum(x)=1 FROM in_b GROUP BY k, i),
   (SELECT sum(x)=1 FROM in_b GROUP BY k, j),
   (SELECT sum(x)=1 FROM in_b GROUP BY k, (i-1)/3,(j-1)/3),
   (SELECT x = 1    FROM in_b WHERE giv),
   (SELECT 0<=x<=1  FROM in_b)
  USING solverlp()           ) AS out_b 
 WHERE out_b.x = 1

Less experienced users may also use a so-called composite user-defined solver sudoku_composite_solver, which is invoked using the following solve query:

SOLVESELECT val IN (SELECT i, j, val FROM in_d) AS r
USING sudoku_composite_solver()

Such composite solvers hide all problem formulation details, and may allow specifying additional (external) constraints, e.g., which allow restricting the digits in the Sudoku matrix:

SOLVESELECT val IN (SELECT i, j, val FROM in_d) AS r
USING sudoku_composite_solver()

The present SolveDB implementation provides two general-purpose view solvers for

  • LP problems, solved using GLPK or Coins CLP (activated using "USING solverlp.basic").
  • MIP problems, solved using GLPK or Coins CBC (activated using "USING solverlp.mip")
  • Black-box global optimization problems, solved with 1 of 15 applicable physical solvers from the SwarmOPS package:
    • Particle Swarm Optimization ("USING solversw.pso")
    • Differential Evolution with Temporal Parameters ("USING solversw.detp")"
    • Differential Evolution Suite ("USING solversw.desuite)
    • Jan. Differential Evolution ("USING solversw.jde")
    • Mesh iteration ("USING solversw.mesh")
    • Forever Accumulating Evolution ("USING solversw.fae")
    • More Yo-yos doing Global optimization ("USING solversw.myg")
    • Local Unimodal Sampling ("USING solversw.lus")
    • Simulated Annealing ("USING")
    • Many Optimizing Liaisons ("USING solversw.mol")
    • Layered and Interleaved Co-Evolution ("USING solversw.lice")
    • Differential Evolution ("USING
    • Random Sampling, Uniform ("USING solversw.rnd)
    • Pattern Search ("USING")
    • Hill-Climber ("USING solversw.hc")
    • Evolution by Lingering Global best" ("USING solversw.elg)

SolveDB+: Integrating Prescriptive Analytics

SolveDB+, a sub-project of SolveDB, is the first purely SQL relational DBMS to support Prescriptive Analytics, the new frontier of Business Analytics. Prescriptive Analytics brings data driven, adaptive, and automatic intelligent decision making and time-dependent planning.
SolveDB brings the power of Prescriptive Analytics to DBMS users by introducing an Extensible Predictive Framework, and Dynamic System Modeling. Common Prescriptive Analytics tasks involve the combination of data collection, cleaning, and transformation, prediction, system modeling, and optimization/scheduling. SolveDB+ provides support for all these phases in a single system.

SolveDB+ comes with a number of prediction solvers for time series forecasting (e.g. ARIMA, linear regression), that can be used seamlessly within a SOLVESELECT query. Considering the problem of predicting an item's order quantity in a warehouse during the next month, the SolveDB+ specification is as follows:

SOLVESELECT quantity IN (SELECT order_date, quantity FROM items_orders WHERE item = 1) 
USING arima_solver(predictions:=1, features:=order_date, p:= 1, d := 0, q := 1)

Given the input relation SELECT order_date, quantity FROM items_orders WHERE item = 1:

Input relation (data)
order_date quantity
1992-01-01 271838
1992-02-01 307123
1992-03-01 245683
1992-04-01 285612
1992-05-01 285479
1992-06-01 301879
Output relation
order_date quantity
1992-01-01 271838
1992-02-01 307123
1992-03-01 245683
1992-04-01 285612
1992-05-01 285479
1992-06-01 301879
1992-07-01 284615

SolveDB+ extends the solver-centric approach of SolveDB by allowing to specify prediction variables as decision variables of the SOLVESELECT query (in this case quantity), and to use one of the available prediction solvers (here arima_solver). The result of the query is the same input relation, but with one additional row (given the user parameter predictions := 1), containing the predicted item's order quantity. SolveDB+ hides from the users the use of the model and the forecasting tasks, while giving control over model selection (arima), model parameters (p,d, and q), and prediction task specification (number of predictions, model features). Alternatively, SolveDB+ also offers a simplified access point to time series forecasting, the predictive advisor, that effectively makes the forecasting process transparent to the user.

The previous query can then solve used the advisor predictive_solver:

SOLVESELECT quantity IN (SELECT order_date, quantity FROM items_orders WHERE item = 1) 
USING predictive_solver(predictions:=1)

The query process the input relation, identifying the time feature, automatically optimizing the model parameters, and producing the same result shown in the previous table.

SolveDB+ hides from the user standard time series forecasting steps, such as data preparation(e.g. identifying time series sample frequency, feature selection), training (e.g. splitting data into training/test/validation sets, training the model on the input relation), validation (cross validation of hyper-parameters, best model selection), and the final prediction, ensuring correctness and efficiency.

SolveDB+ syntax allows the user to specify nested SOLVESELECT statements. Therefore, it is possible to combine both optimization problem specification and forecasting queries declaratively within a single query. For example, the following query decides the number of items to keep in storage, in order to maximize the retail_price, under the constraint of their size being under a threshold:

   (SOLVESELECT quantity IN (SELECT order_date, quantity FROM items_orders) 
    USING predictive_solver(predictions:=1, group := item)) t
 WHERE order_date = 1992-07-01) AS u)
MAXIMIZE (SELECT retail_price FROM u)
USING solverlp

SolveDB+ also offer an interface for forecasting solvers extension. Therefore, it becomes easier for analyst and developers to integrate standard and custom models within the DBMS, making use of both the efficiency of in-DBMS analytics, and of the automatic forecasting workflow management provided by SolveDB+.

Interested in SolveDB or SolveDB+?