YOLO takes the stance of using a classifier as a detector. In short, the algorithm takes the path of splitting a frame into SxS subimages and processes each subimage under the premise that it has an object within it, centered in the subimage. It then performs some image processing to determine a series of bounding boxes of interest, then runs classifiers on each of the bounding boxes. The classifier returns a confidence metric for each classifier, say 0-100. So, suppose you have a bounding box that contains a dog, the algorithm would run a 'cat' classifier on the bounding box and get a low confidence score, it'd then run a 'bowling ball' classifier and also get a low score,...., then run a 'dog' classifier and get a high score. The subimage tile would then be tagged as having a dog in it. The algorithm is based on each subimage tile having no more than one object within it. Highest confidence metric wins.
The rest of this post will focus on quickly setting up YOLO and running it on a series of test images. Essentially, 3 steps: 1) download and install darknet (open-source neural net), 2) download neural net YOLO weights, 3) run YOLO on a series of images. Let's get started.
Install Darknet
$ git clone https://github.com/pjreddie/darknetCloning into 'darknet'...
remote: Enumerating objects: 5901, done.
remote: Total 5901 (delta 0), reused 0 (delta 0), pack-reused 5901
Receiving objects: 100% (5901/5901), 6.16 MiB | 4.44 MiB/s, done.
Resolving deltas: 100% (3915/3915), done.
Checking connectivity... done.
$ cd darknet; make
...
Download YOLO Weights
$ wget https://pjreddie.com/media/files/yolov3.weights -O darknet/yolov3.weights
Run on Images
$ cd darknet
$ ./darknet detect cfg/yolov3.cfg yolov3.weights ~/Photos/image01.jpg
$ display predictions.jpg
The predictions image will surround detected images with bounding boxes and a label, like this:
Running YOLO on the above photo will result in the output and predictions image;
/home/lipeltgm/Downloads/nature-cats-dogs_t800.jpg: Predicted in 76.252667 seconds.
dog: 95%
cat: 94%
person: 99%
person: 99%
YOLO found 4 objects, with high confidence for each: 1 cat, 1 dog and two people;
Running on my existing personal photos (~6400 images) and adhoc reviewing the results looks extremely promising; results follow:
Without any pre-processing or prep, I ran the YOLO classifier at my personal archive of photos, some 6400 images of vacations, camping trips, weddings,.... This process took a couple days, launching the darknet detect process individually for each photo, as a result the weights were loaded for each photo that significantly slowed the process, but wasn't really interested in performance as in the detections themselves.
Here is the types of objects found in my photos:
lipeltgm@kaylee:~$ grep "^.*:" ./blog/YOLO/darknet/bigrun.log | grep -v Predic | cut -f 1 -d ':' | sort | uniq -c | sort -n
1 baseball glove
1 broccoli
1 hot dog
1 kite
1 scissors
2 donut
2 mouse
2 parking meter
3 apple
3 banana
3 orange
3 pizza
3 skateboard
3 snowboard
3 zebra
4 sandwich
4 toothbrush
5 train
6 bus
6 skis
6 stop sign
7 baseball bat
7 fork
8 giraffe
8 toilet
9 cow
9 knife
9 microwave
9 spoon
9 surfboard
10 frisbee
10 remote
12 tennis racket
14 aeroplane
15 elephant
16 oven
18 motorbike
18 sink
20 wine glass
21 vase
22 fire hydrant
26 bicycle
26 sheep
29 cake
31 refrigerator
34 cat
34 suitcase
36 teddy bear
38 sports ball
40 horse
43 laptop
49 cell phone
54 traffic light
64 bear
70 bowl
77 bed
77 clock
88 pottedplant
103 bird
106 backpack
112 handbag
124 sofa
124 tvmonitor
129 umbrella
156 dog
170 bottle
187 book
194 diningtable
204 tie
237 bench
275 truck
304 cup
355 boat
1029 chair
1333 car
14683 person
Without any pre-processing or prep, I ran the YOLO classifier at my personal archive of photos, some 6400 images of vacations, camping trips, weddings,.... This process took a couple days, launching the darknet detect process individually for each photo, as a result the weights were loaded for each photo that significantly slowed the process, but wasn't really interested in performance as in the detections themselves.
Here is the types of objects found in my photos:
lipeltgm@kaylee:~$ grep "^.*:" ./blog/YOLO/darknet/bigrun.log | grep -v Predic | cut -f 1 -d ':' | sort | uniq -c | sort -n
1 baseball glove
1 broccoli
1 hot dog
1 kite
1 scissors
2 donut
2 mouse
2 parking meter
3 apple
3 banana
3 orange
3 pizza
3 skateboard
3 snowboard
3 zebra
4 sandwich
4 toothbrush
5 train
6 bus
6 skis
6 stop sign
7 baseball bat
7 fork
8 giraffe
8 toilet
9 cow
9 knife
9 microwave
9 spoon
9 surfboard
10 frisbee
10 remote
12 tennis racket
14 aeroplane
15 elephant
16 oven
18 motorbike
18 sink
20 wine glass
21 vase
22 fire hydrant
26 bicycle
26 sheep
29 cake
31 refrigerator
34 cat
34 suitcase
36 teddy bear
38 sports ball
40 horse
43 laptop
49 cell phone
54 traffic light
64 bear
70 bowl
77 bed
77 clock
88 pottedplant
103 bird
106 backpack
112 handbag
124 sofa
124 tvmonitor
129 umbrella
156 dog
170 bottle
187 book
194 diningtable
204 tie
237 bench
275 truck
304 cup
355 boat
1029 chair
1333 car
14683 person
Gotta say, pretty cool and located a number of random objects I didn't realized I had photos of. Who knew I had a photo of zebras, but in-fact I really do. DisneyWorld is amazing:
Have fun with it!!