Homework #3
EE 541: Fall 2025
Assigned: 15 September
Due: Sunday, 22 September at 23:59
BrightSpace Assignment: Homework 3
Getting Started Guide: View Guide
Data: External Link
Problem 1: Linear MMSE Estimation
Consider the problem of estimating a scalar random variable \(Y\) from a vector observation \(\mathbf{X} \in \mathbb{R}^n\). We want to find the linear MMSE estimator \(\hat{Y} = \mathbf{w}^T\mathbf{X}\) that minimizes the mean squared error (MSE), \(\mathbb{E}[(Y - \hat{Y})^2]\).
Given two zero-mean jointly Gaussian random variables \(X\) and \(Y\) with covariance matrix \[ \mathbf{K} = \begin{bmatrix} 5 & 2 \\ 2 & 4\end{bmatrix}, \] find the linear MMSE estimator \(\hat{Y} = w^*X\) for \(Y\) given \(X\). That is, find the optimal weight \(w^*\) that minimizes the MSE.
Calculate the minimum mean squared error (MMSE) achieved by the optimal estimator \(\hat{Y} = w^*X\).
Show that for jointly Gaussian random variables, the linear MMSE estimator found in part (a) is equivalent to the conditional expectation \(\mathbb{E}[Y|X]\). In other words, prove that \(w^*X = \mathbb{E}[Y|X]\).
Now suppose \(X\) and \(Y\) are not jointly Gaussian but have the same covariance matrix \(\mathbf{K}\) as above. Find the linear MMSE estimator \(\hat{Y} = \tilde{w}X\) in this case. Is the MMSE achieved by \(\tilde{w}\) different from the jointly Gaussian case in part (b)? Explain why or why not.
Problem 2: Eigenanalysis of Covariance Matrix and PCA
Consider a zero-mean random vector \(\mathbf{X} \in \mathbb{R}^3\) with covariance matrix \[ \mathbf{K} = \begin{bmatrix} 4 & -1 & 2 \\ -1 & 5 & -1 \\ 2 & -1 & 3 \end{bmatrix}. \]
Find the eigenvalues \(\lambda_k\) and orthonormal eigenvectors \(\mathbf{e}_k\) of \(\mathbf{K}\).
Show that the covariance matrix \(\mathbf{K}\) can be expressed in terms of its eigenvalues and eigenvectors using the spectral decomposition (this is a special case of Mercer’s theorem): \[ \mathbf{K} = \sum_{k=1}^3 \lambda_k \mathbf{e}_k \mathbf{e}_k^T. \]
Express \(\mathbf{X}\) using its Karhunen-Loève expansion (KL expansion), i.e. \[ \mathbf{X} = \sum_{k=1}^3 Z_k \mathbf{e}_k, \] where \(Z_k\) are uncorrelated random variables with zero mean and variance equal to the corresponding eigenvalues \(\lambda_k\). This expansion is closely related to Principal Component Analysis (PCA), where the eigenvectors of the covariance matrix are called principal components and the eigenvalues represent the variance, often interpreted as “power,” captured by each component.
Suppose you want to approximate \(\mathbf{X}\) using only its two dominant eigenmodes (i.e. the two principal components with the largest eigenvalues). Write the approximation \(\tilde{\mathbf{X}}\) in terms of the eigenvectors and eigenvalues of \(\mathbf{K}\). This is an example of dimensionality reduction using PCA.
What is the mean squared error (MSE) of the approximation in (d), i.e. \[ \mathbb{E}[\lVert\mathbf{X} - \tilde{\mathbf{X}}\rVert^2]? \] Express your answer in terms of the eigenvalues. This MSE is related to the concept of reconstruction error in PCA and the total variance captured by the selected principal components.
Problem 3: Polynomial Regression
Use only Python standard library modules, numpy, and matplotlib for this problem. Do not use numpy.linalg or any built-in regression functions.
Many signal processing applications require separating a clean signal from noisy measurements. In this problem, you will implement polynomial regression using direct matrix calculations to recover a signal corrupted by additive noise.
Generate synthetic data by combining a signal \(f(t)\) with Gaussian noise: \[y(t) = f(t) + n(t),\] where \(f(t) = 0.5 + 0.4t - 0.3t^2 + 0.2t^3\) and \(n(t) \sim \mathcal{N}(0,0.1)\) is white Gaussian noise. Sample 1000 points uniformly on \([0,1]\). Store both \(f(t)\) and \(y(t)\).
Perform a 70-30 split of your data points into training (first 700 points) and testing (remaining 300 points) sets. For each polynomial degree in \(d = \{3, 5, 7, 9\}\):
- Construct the design matrix \(\mathbf{X}\) where row \(i\) is \([1, t_i, t_i^2, ..., t_i^d]\) using the training data.
- Calculate regression coefficients \(\mathbf{w}\) by solving \((\mathbf{X}^T\mathbf{X})\mathbf{w} = \mathbf{X}^T\mathbf{y}\) using matrix multiplication and back-substitution. Check (and comment) whether \((\mathbf{X}^T\mathbf{X})\) is poorly conditioned.
- Evaluate the mean-square error (MSE) on both the training set and test set using these coefficients.
Plot fitted polynomials against the true signal using test data points. Generate a figure showing training and test MSE versus polynomial degree.
Add \(\ell_2\) regularization with degree \(d=5\) by solving: \[(\mathbf{X}^T\mathbf{X} + \alpha\mathbf{I})\mathbf{w} = \mathbf{X}^T\mathbf{y},\] for \(\alpha = \{ 0.1, 1.0, 10.0 \}\). Check (and comment) whether \((\mathbf{X}^T\mathbf{X} + \alpha I)\) is poorly conditioned. Plot training and test MSE versus \(\alpha\). Save your best model coefficients (based on test MSE) to
coeff.txtusingnp.savetxt('coeff.txt', w).Consider the MSE decomposition for polynomial regression: \[\mathbb{E}[(y - \hat{f}(x))^2] = \text{Bias}[\hat{f}(x)]^2 + \text{Var}[\hat{f}(x)] + \sigma^2.\] Use your results from (a) and (b) to explain how polynomial degree and regularization strength affect this decomposition. Relate the behavior of your fitted polynomials to regions of high bias (underfitting) versus high variance (overfitting). Use specific examples to argue whether regularization is effective in reducing overfitting or not.
Problems 1-2: Analytical problems
Write your solutions to these homework problems on separate sheets of paper. Show all work and box answers where appropriate. Do not guess.
Problem 3: Polynomial Regression
Submit figures, analysis, and discussion to BrightSpace. Submit your Python code and saved coefficients file as appendix to Gradescope. A suitably annotated Jupyter notebook with inline analysis is sufficient.
Gradescope: Problem 3