# Optimality conditions of the Graphical Lasso

Let $\mathbf K = \mathbf \Sigma^{-1}$ and $\mathbf S$ be respectively the precision and sample covariance matrices of a multivariate gaussian distribution. The Graphical Lasso amounts to minimize the following penalized negative log-likelihood $$\operatorname{min}_{\mathbf K} \mathcal J(\mathbf K) = -\operatorname{logdet}(\mathbf K) + \operatorname{trace}(\mathbf S \mathbf K) + \lambda ||\mathbf K||_1.$$ where $\operatorname{logdet}$ is the logarithm of the determinant, $\lambda$ is an hyper-parameter for sparsity and $||\mathbf K||_1$ is the sum of the absolute value of the matrix entries.

The function $\mathcal J(\mathbf K)$ is convex but not differentiable at $K_{ij} = 0.$ So at optimality, we derive the subgradient instead of the classic gradient. Note that the derivative of $\operatorname{logdet}(\mathbf K) = \mathbf K^{-1}$ (_Boyd and Vanderberghe, 2004_). The **Karush-Kuhn-Tucker (KKT) optimality conditions** can be written as: $$-\mathbf K^{-1} + \mathbf S + \lambda \operatorname{Sign}(\mathbf K) = 0.$$ where \operatorname{Sign}( K_{ij}) = \begin{cases} -1, & \text{if $K_{ij} < 0$}\\
\in [-1,1], & \text{if $K_{ij} = 0$} \\
+1, & \text{if $K_{ij} > 0.$} \end{cases}

The covariance matrix $\mathbf \Sigma$ being positive definite, the precision matrix $\mathbf K$ is also positive definite. So the diagonal entries of $\mathbf K$ are all positive (and the matrice is dominant diagonal). Let $\mathbf K^{-1} = \hat{\mathbf{\Sigma}}.$ As a result, $$K_{ii}>0 \implies -\hat{\Sigma}_{ii} + S_{ii} + \lambda = 0$$ and we get $$\hat{\Sigma}_{ii} = S_{ii} + \lambda.$$

This exact relationship between the diagonals of estimated and empirical covariance is used as initialization in solvers of the previous optimization problem, such as glasso (Friedman et al. 2007).