Stanford EE364A Convex Optimization I Stephen Boyd I 2023 I Lecture 12

22 Mar 2024 (over 1 year ago)

Analytic Center

The analytic center of a set of inequalities is the point that maximizes the product of the margins for the inequalities.
The analytic center of a set of inequalities is a convex problem and can be efficiently computed.
The level sets of the log barrier function for the analytic center are close to ellipsoidal.
The ellipsoid defined by the analytic center and a factor of M covers the set.

Linear Discrimination

Linear discrimination is the problem of finding a hyperplane that separates two sets of points.
Strict inequalities in linear discrimination can lead to trivial solutions, so they should be replaced with non-strict inequalities.
The strict inequalities in the linear discrimination problem can be replaced with non-strict inequalities by scaling the variables.
The problem of finding a separating hyperplane between two sets of vectors can be formulated as an LP feasibility problem.
The thickest separating slab can be found by solving a quadratic program.
The dual of the QP is an interesting problem that involves finding the maximum margin of separation.
Adding slack variables to the inequalities allows for the handling of non-linearly separable points.

Convex Relaxation

The speaker introduces the concept of minimizing the sum of slack variables in a linear programming problem.
They show how this can be used to formulate a convex problem for minimizing the number of misclassified points in a binary classification problem.
The speaker discusses the relationship between the original problem and the convex relaxation and explains why the optimal value of the convex problem provides an upper bound on the number of misclassifications.
They mention that this approach is often used in practice and that it has been shown to work well in many cases.
The speaker provides an example from finance where a convex surrogate is used for a risk measure and explains how this surrogate turned out to be actually what was needed, not the original measure.
They emphasize that the convex relaxation can be better than the original problem in some cases and that it is important to choose the right surrogate function.

Feature Engineering and Polynomial Discriminators

Feature engineering can be thought of as transforming input data through functions to create a larger vector for classification or separation.
Quadratic discrimination can be achieved by setting up linear inequalities in the variables P, Q, and R.
Separation by an ellipsoid can be used to identify a safe set of values for complex industrial processes.
Quartic and higher-degree polynomial discriminators can be found by iteratively increasing the degree until a successful separation is achieved.
The degree of a polynomial is a quasi-convex function, which allows for quasi-convex optimization.

Placement and Facility Location Problems

Placement and facility location problems involve minimizing the sum of functions between pairs of points, with applications in warehouse location and circuit design.
Designing penalties and functions is a recurring theme in optimization problems.
Minimizing the Euclidean distance between connected points leads to a compact arrangement, while minimizing the sum of squared distances results in a more spread-out arrangement.

Numerical Linear Algebra

The importance of understanding numerical linear algebra is emphasized, as it provides insights into the complexity and efficiency of solving linear equations.
The current state-of-the-art algorithm for solving linear equations has a complexity of around 2.3, which is a significant improvement over the theoretical lower bound of N squared.
Floating-point operations (flops) are used to estimate the complexity of algorithms.
Vector operations are classified into three levels: BLAS level one, level two, and level three.
Sparse matrices have a special structure that can be exploited for efficient computations.
Low-rank factorization is another structure that can be used for efficient computations.
The complexity of matrix-matrix multiplication is n^2.8 in practice, which is more efficient than the theoretical complexity of n^3.
Matrix multiplication can be optimized for speed by breaking it down into smaller sub-blocks and utilizing the computer's cache hierarchy.
Modern computers can perform billions of floating-point operations per second, making matrix operations incredibly fast.
Solving triangular systems of equations is much faster than solving non-triangular systems, with a complexity of n^2 flops compared to n^3 flops.
Blocked implementations are used in practice for solving triangular systems, rather than solving them line by line.
To solve a linear system Ax = B, one can factorize A into a product of matrices that are easy to invert.
The inverse of A is then the product of the inverses of the factors in reverse order.
Solving Ax = B by this method involves a sequence of solves, each with one of the factors.
In the simplest cases, the factorization costs order n^3 flops, while the solve costs order n^2 flops.
Solving multiple sets of equations with the same coefficient matrix A can be done much faster than solving each set individually, due to factorization caching.
The LU factorization is a common choice for factorization, and it has a cost of n^3 flops.
The LU factorization is similar to Gaussian elimination, but it is not the way linear systems are actually solved in practice.
Multi-core processors can perform multiple tasks simultaneously.
The concept of blocking and non-blocking code remains the same regardless of the number of cores.