Optimality conditions of the Graphical Lasso

Let K=Σ1 and S be respectively the precision and sample covariance matrices of a multivariate gaussian distribution. The Graphical Lasso amounts to minimize the following penalized negative log-likelihood minKJ(K)=logdet(K)+trace(SK)+λ||K||1. where logdet is the logarithm of the determinant, λ is an hyper-parameter for sparsity and ||K||1 is the sum of the absolute value of the matrix entries.

The function J(K) is convex but not differentiable at Kij=0. So at optimality, we derive the subgradient instead of the classic gradient. Note that the derivative of logdet(K)=K1 (Boyd and Vanderberghe, 2004). The Karush-Kuhn-Tucker (KKT) optimality conditions can be written as: K1+S+λSign(K)=0. where Sign(Kij)={1,if Kij<0[1,1],if Kij=0+1,if Kij>0.

The covariance matrix Σ being positive definite, the precision matrix K is also positive definite. So the diagonal entries of K are all positive (and the matrice is dominant diagonal). Let K1=Σ^. As a result, $$K_{ii}>0 \implies -\hat{\Sigma}{ii} + S{ii} + \lambda = 0andweget\hat{\Sigma}{ii} = S{ii} + \lambda.$$

This exact relationship between the diagonals of estimated and empirical covariance is used as initialization in solvers of the previous optimization problem, such as glasso (Friedman et al. 2007).

Avatar
Edmond Sanou
PhD in Applied Mathematics