This blog covers a list of 11 software feature requirements necessary when adding stochastic variability to optimization models. In the second post, we'll discuss how modern optimization software platforms can enable complex stochastic optimization problems (i.e. linear programs) to be easily modeled. Particular emphasis will be on problems that typically require stochastic programming, which can be particularly difficult to model easily. Examples will be included.
Stochastic optimization problems are normally solved by using Monte Carlo simulation, which is a batch process of n solves, each time randomly generating different objective function coefficients, matrix coefficients, or bound values within the specified distribution function and parameters. The goal of any Monte Carlo simulation is to generate a large enough sample so the set of solutions becomes statistically significant.
Although the end goal might be similar regardless of software used, the requirements involved to accurately model stochastic optimization problems can vary greatly. Depending on the particular approach, and especially type of software used, the process can vary from quick and easy to very difficult and time consuming.
Based on my past experience, the typical stochastic modeling exercise using a third-generation language or a fourth-generation algebraic modeling language goes something like this:
Contrast this to modern, fifth-generation programming languages that offer a code-free approach, where deterministic models can be made stochastic in minutes (and back again) simply by changing data.
To summarize the uniqueness of this approach, I’ve compiled a list of the top eleven ‘must have’ software requirements for creating stochastic optimization problems. These requirements are new in the sense that third- and fourth-generation languages, used for the last few decades, generally do not have these features available.
This list is aimed particularly at modelers and end users that either are currently running optimization-based Monte Carlo simulations or are evaluating software platforms in anticipation of future use.
Holistic optimization models should require no explicit separation (i.e. code branching) between deterministic and stochastic modes. By default, if no active stochastic data is defined, the model is considered deterministic. If stochastic data is defined and active, the model is considered stochastic. Requiring a model to be hard-coded as either deterministic or stochastic is too restrictive, so flexibility is key.
Stochastic definitions, including the specification of the probability distribution function and parameters, should be entirely data driven. The end user should be free to select the required distribution amongst a list of choices. Parameters should always be data driven, based on the end user’s distribution function selection. They should never be hard-coded.
Oftentimes, end users eventually ask for stochastic definition support for something not currently in their model. These requests become change orders, which can require extra work that takes days or even weeks to implement. Instead, modeling platforms with out-of-the-box support for applying stochastic variability to all types of data should be sought out that include:
Stochastic definitions should be flexible enough to allow definition for each variable explicitly. For third- and fourth-generation software languages, this is normally highly impractical, if not impossible. For example, a model with 100,000 variables might have 1,000 variables with randomly generated matrix coefficient values each time the matrix is generated. Although extremely unlikely all 1,000 would have a different function assigned, it’s often the case that, even among the same type of variable (e.g., machine hours), the variability of unplanned downtime will change in future time periods versus those closer to t=0.
For older optimization modeling platforms this requires messy code (e.g., IF statements), which effectively splits the same set of decision variables into multiple subsets, with each subset having a different distribution function assigned. But why go to this effort if not needed? A data-driven approach lets the end user assign a different distribution function to each and every coefficient, if necessary.
Since uncertainty frequently increases in the future, most stochastic models define a time horizon using multiple time periods (buckets). Specifying a single distribution function for the same set of variables is usually not too difficult. However, the best modeling platforms allow users these options:
This point can be particularly problematic. Consider optimization modeling platforms that randomly generate values during the matrix generation step. There should be no separate logic required to generate these values apart from the normal solve process. Any approach that requires the user to generate stochastic values outside the normal matrix generation step, particularly if required to be input as data, should be avoided if possible.
Point #1 addressed the model globally. The user should not have to explicitly set a flag to generate stochastic data. The same requirement does not apply at the individual variable level. The best modeling platforms let the end user toggle sets of stochastic definitions, or even individual data elements, from deterministic to stochastic and then back to deterministic again. This technique can be very useful when “freezing” particular stochastic data to test the sensitivity to other stochastic data.
A common complaint of stochastic optimization is that, if the numbers are randomly generated, then how can the same solution be recreated if needed (i.e. same model objective function value)? Modern optimization software solves this issue easily be storing seed values with each solve — e.g. 1363092240. As long as a copy of the model is saved so that all deterministic input data remains the same, a seed value can be input in the future in order to force the same prior solution. This is particularly useful for Monte Carlo simulations where some additional processes are omitted until the best overall solution is found. Then the seed value can be retrieved and the model re-solved with all additional processes executed.
End users are frequently confused by the use of stochastic variables because they cannot “see” them in the model. The best optimization modeling platforms, like those found in fifth-generation languages, include graphical user interface visualizations that will identify and highlight where active stochastic definitions exist in the model, what they are, and how they affect the remainder of the model.
Look for an optimization platform with built-in data checking. Generating stochastic values, like those used for variable constraints, can easily cause data errors. For example, if a generated minimum constraint value exceeds a deterministic maximum constraint value, then any software package that requires the modeler to code data checks from scratch will take much longer. The best optimization modeling platforms include a large library of built-in data checks, which execute between the time the random values are generated and when the model is solved. If a generated random value is deemed inconsistent with other data or likely to cause an infeasibility, an error message is generated and the user is alerted.
Along the same lines as defining stochastic data, look for optimization modeling platforms that enables the process of easily running Monte Carlo simulations. Preferably, any script needed to automate this process should be very simple to create — e.g. if printed out, a half page or less of code. Anything overly complicated should be avoided since too much complexity will make supporting it much more difficult in the future. Ideally, automating the process should be straightforward with very little documentation needed to get started.
It is my hope that this list helps to better inform those who are currently running optimization-based Monte Carlo simulations and those who are seeking software solutions for such simulations. By using the most capable software solution, modelers and end users can achieve significantly better results in much less time than was previously ever possible.