Naoki Yokoyama
About MeResearchTutorialsSide Projects CV

The Problem with Fully-Connected Networks

A fully connected neural net

  • Fully-connected layers learn global patterns in a training sample; ordering matters!
  • Fully-connected models have to learn the pattern as if it was new/different if it appears elsewhere in the input, i.e if the input is shifted slightly.
  • Not suited for learning local features placed arbitrarily in an image!
  • We need a model that is not sensitive to the locations at which features appear in the image.
  • How Convolutional Neural Nets Work

    A convolutional layer with a stride of 2, output depth of 2. Usually these layers will have a stride of 1 though.

  • Uses a constrained window (usually 3x3 or 5x5 pixels) that processes each input pixel independently.
  • Thus, each result does not depend on where the window is in the image, just the patch of pixels it captures.
  • Each patch then produces a tensor product with a learned kernel.
  • This product is flattened into one dimension, and is placed in the corresponding spatial location of the output.
  • Output of a convolutional layer is usually activated using ReLU.
  • Max Pooling

    The input grid represents the green grids from the previous image.

  • Outputs of convolutional layers are downsampled, disposing unhelpful information and allowing important features to persist.
  • Shrinking outputs also allows the model to learn features not necessarily captured by the small 3x3 kernels of convolutional layers.
  • Max pooling layers are used to shrink the outputs.
  • Similar to convolutional layers, a sliding window is used, usually 2x2.
  • However, instead of using learned weights, the output of a max pool window will always be the max value captured.
  • Max pooling is preferred to average pooling to prevent prominent features from being degraded.
  • Beyond Convolutional Neural Networks

    These images may look the same to a CNN.

  • CNNs are a little TOO translation invariant (see image above).
  • Recently, Hinton created the concept of Capsule Networks, which take spatial hierachies into consideration when looking for patterns.
  • CNNs are still sensitive against variations such as rotation and scale.
  • To combat this, there are data augmentation techniques that expand training sets by creating additional variants of training samples, tweaking rotation, scale, blur, noise, color, etc.
  • From Hinton's Reddit AMA in r/MachineLearning: "The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster."
  • Fully Connected Example

    # Prepare training/test set
    from keras.datasets import mnist
    (train_im,train_labels),(test_im,test_labels) = mnist.load_data()
    train_im = train_im.reshape((60000,28*28))
    train_im = train_im.astype('float32')/255
    test_im  = test_im.reshape((10000,28*28))
    test_im  = test_im.astype('float32')/255
    from keras.utils import to_categorical
    train_labels = to_categorical(train_labels)
    test_labels  = to_categorical(test_labels)
    
    # Create neural net
    from keras import models, layers
    net = models.Sequential()
    # Input, fully connected layer
    net.add(layers.Dense(512,activation="relu",input_shape=(28*28,)))
    # Output, classification layer
    net.add(layers.Dense(10,activation="softmax"))
    # Add loss/optimizer
    net.compile(optimizer='rmsprop',
    		loss='categorical_crossentropy',
    		metrics=['accuracy'])
    # Train
    net.fit(train_im,train_labels,epochs=5,batch_size=128)
    loss,acc = net.evaluate(test_im, test_labels)
    print acc
    

    Accuracy on test set: 97.8%
    As you can see, a 2D image must be flattened into a 1D vector for use with fully connected layers.

    Convolutional Example

    # Prepare training/test set
    from keras.datasets import mnist
    (train_im,train_labels),(test_im,test_labels) = mnist.load_data()
    train_im = train_im.reshape((60000,28,28,1))
    train_im = train_im.astype('float32')/255
    test_im  = test_im.reshape((10000,28,28,1))
    test_im  = test_im.astype('float32')/255
    from keras.utils import to_categorical
    train_labels = to_categorical(train_labels)
    test_labels  = to_categorical(test_labels)
    
    # Create neural net
    from keras import models, layers
    net = models.Sequential()
    # Input, convolutional layer
    net.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=(28,28,1)))
    net.add(layers.MaxPooling2D((2,2)))
    # Hidden convolutional layer #1
    net.add(layers.Conv2D(64,(3,3),activation="relu"))
    net.add(layers.MaxPooling2D((2,2)))
    # Hidden convolutional layer #2
    net.add(layers.Conv2D(64,(3,3),activation="relu"))
    # Flatten for classification
    net.add(layers.Flatten())
    # Classification layer
    net.add(layers.Dense(64,activation='relu'))
    net.add(layers.Dense(10,activation='softmax'))
    # Add loss/optimizer
    net.compile(optimizer='rmsprop',
    		loss='categorical_crossentropy',
    		metrics=['accuracy'])
    # Train
    net.fit(train_im,train_labels,epochs=5,batch_size=128)
    loss,acc = net.evaluate(test_im, test_labels)
    print acc
    

    Accuracy on test set: 99%
    With CNNs, 2D images are processed as they are, without being flattened into a 1D signal. May not seem like much of a bump, but this is a classification task, in which the objects (MNIST digits) are all generally the same size and are centered. Convolutional layers will shine brighter at detection tasks, where their robustness against arbitrary placement of features will be more useful.