Let and be respectively the precision and sample covariance matrices of a multivariate gaussian distribution.
The Graphical Lasso amounts to minimize the following penalized negative log-likelihood
where is the logarithm of the determinant, is an hyper-parameter for sparsity and is the sum of the absolute value of the matrix entries.
The function is convex but not differentiable at So at optimality, we derive the subgradient instead of the classic gradient. Note that the derivative of (Boyd and Vanderberghe, 2004).
The Karush-Kuhn-Tucker (KKT) optimality conditions can be written as:
where
The covariance matrix being positive definite, the precision matrix is also positive definite. So the diagonal entries of are all positive (and the matrice is dominant diagonal). Let As a result, $$K_{ii}>0 \implies -\hat{\Sigma}{ii} + S{ii} + \lambda = 0\hat{\Sigma}{ii} = S{ii} + \lambda.$$
This exact relationship between the diagonals of estimated and empirical covariance is used as initialization in solvers of the previous optimization problem, such as glasso (Friedman et al. 2007).