Locating and Classifying Objects in Images
Unlike classifiers, object detectors must both locate and classify objects in images. Classifiers simply assign a label to an image.
Many detection frameworks have been researched and released.
There are two-stage and single-stage detectors. Two-stage detectors are usually more accurate, while single-stage detectors are faster. Two-stage detectors use a classifier that is applied to a small set of candidate object locations discovered from the image.
Popular Object Detection Frameworks
R-CNN: Two-stage detector that produces "region proposals" that are then fed into a classifier neural net. So far, Fast R-CNN and Faster R-CNN have been released. Faster R-CNN uses a region proposal network rather than a selective search algorithm in order to produce faster results.
SSD: Single-Shot Detector. Not as accurate, but faster, with all calculations managed by a single network. See here for TensorFlow's object detection API, which supports both SSD and Faster R-CNN.
YOLO: You Only Look Once. Faster and more accurate than SSD. Currently, there is a YOLOv3. Like SSD, there are no region proposals, and all localizations and classifications are done by a single network. See here for Darknet, the YOLO implementation done in C by the creator of YOLO.
In my experience working with the artificial datasets that I created, YOLO indeed performed better than SSD done in TensorFlow.
RetinaNet: More accurate than YOLO and SSD, but slower, while still being a single-stage detector. Uses focal loss to prevent "the vast number of easy negatives from overwhelming the detector during training," allowing the detector to ignore things that are not in the foreground / do not seem relevant.
Adapted from the RetinaNet paper to include YOLOv3's results