A Proximal Quasi-Newton Trust-Region Method for Nonsmooth Regularized Optimization

1 minute read

This was a project that I developed with Aleksander Aravkin and Dominique Orban. We attempt to tackle a very generic set of regularized problems f+h, where both may be nonconvex, and h may be nonsmooth. It is currently under review for SIAM Optimization, and ArXiv copy can be found here. I presented at the DOE CSGF Program Review and SIOPT virtually in July 2021.

We develop a trust-region method for minimizing the sum of a smooth term f and a nonsmooth term h), both of which can be nonconvex. Each iteration of our method minimizes a possibly nonconvex model of f+h in a trust region. The model coincides with f+h in value and subdifferential at the center. We establish global convergence to a first-order stationary point when f satisfies a smoothness condition that holds, in particular, when it has Lipschitz-continuous gradient, and h is proper and lower semi-continuous. The model of h is required to be proper, lower-semi-continuous and prox-bounded. Under these weak assumptions, we establish a worst-case O(1/ϵ^2) iteration complexity bound that matches the best known complexity bound of standard trust-region methods for smooth optimization. We detail a special instance, named TR-PG, in which we use a limited-memory quasi-Newton model of f and compute a step with the proximal gradient method, resulting in a practical proximal quasi-Newton method. We establish similar convergence properties and complexity bound for a quadratic regularization variant, named R2, and provide an interpretation as a proximal gradient method with adaptive step size for nonconvex problems. R2 may also be used to compute steps inside the trust-region method, resulting in an implementation named TR-R2. We describe our Julia implementations and report numerical results on inverse problems from sparse optimization and signal processing. Both TR-PG and TR-R2 exhibit promising performance and compare favorably with two linesearch proximal quasi-Newton methods based on convex models.

Data Fit with TR-PG Objective Descent with TR-PG Data fit with R2 Objective Descent with R2
Probably the most important final results, where we enforce an l0-norm constraint on a system of ODEs.