9.2 Motivation

Three advantage of convolution * Sparse interpretation * Parameter Sharing * Equivariant Representation

Sparse Interpretation

  • Traditional Matrix Multiplication: Every output unit interact with every inout unit.
  • Convolution: Sparse connectivity

We need to store fewer parameters, which improves

  • Memory requirement
  • statistical efficiency
../../_images/Figure9.2.PNG ../../_images/Figure9.3.PNG ../../_images/Figure9.4.PNG

This allows the network to efficiently describe complicated interaction between many variables by construction such interaction from simple building blocks that each describe only sparse interaction.

Parameter sharing

Use the same parameter for more than one function in a model.

  • Matrix Multiplication: Each element of the weight matrix is used exactly once when computing the output. It is multiplied by one element of the input and never revisited.
  • Convolution: Tied weights, value of weight applied to one input is tied the value of a weight applied to applied elsewhere. Rather than learning a seperate set of parameters for every location, we learn only one set. Dramatically decrease the storage requriements.

Combine sparse connectivity and parameter sharing:


Equivariance to Translation

If the input changes, the output changes in the same way. Specifically, a function f(x) is equivariant to function g if \(f(g(x)) = g(f(x))\). Eg:

  • g: shift image
  • f: convolution

When processing time-serise data: convolution produces a sort of timeline that shows when different features appear in the input. If we move an event later in time in the input, the exact same representation of it will appear in the outout

Convolution is not natually quivariant to some other transformation such as changes in scale or rotation of an image.

Enables processing flexible shaped data

Discussed in 9.7