personal notes from the lectures by Prof. Song Han on Efficent ML
Link to the original slides.
Every neuron in the output layer is connected to every other neuron in the previous layer, resulting in a dense weight matrix.
Input Features: $\mathbf{X} = (n, c_i)$
Weight Matrix: $\mathbf{W} = (c_o, c_i)$
Output Features: $\mathbf{Y} = (n, c_o)$
Bias: $\mathbf{b} = (c_o)$
where, $c_i$ = number of input channels, $c_o$ = number of output channels, and n = batch_size
Output neuron is connected to the input receptive field. Receptive field is usually the kernel shape. Based on the shape of the kernels, we have 1D and 2D convolution operations.
Simpler to visualize from a 2D Kernel, basically you need to match the number of channels and discard either the height or width component.
Input Features, $\mathbf{X} = (n, c_i, w_i)$
Weight Matrix, $\mathbf{W} = (c_o, c_i, k_w)$
Output Features, $\mathbf{Y} = (n, c_o, w_o)$
Bias, $\mathbf{b} = (c_o)$
where, $c_i$ = number of input channels, $c_o$ = number of output channels, n = batch_size and $k_w$ is width of kernel.