Sunday, December 29, 2019

YOLO - Computer Vision

I recently stumbled upon the You Only Look Once (YOLO) computer vision algorithm that shows some remarkable results.  This post will focus on a brief introduction to this system and some examples of use in the limited time I've spent with it as of recent.

YOLO takes the stance of using a classifier as a detector.  In short, the algorithm takes the path of splitting a frame into SxS subimages and processes each subimage under the premise that it has an object within it, centered in the subimage.  It then performs some image processing to determine a series of bounding boxes of interest, then runs classifiers on each of the bounding boxes.  The classifier returns a confidence metric for each classifier, say 0-100.  So, suppose you have a bounding box that contains a dog, the algorithm would run a 'cat' classifier on the bounding box and get a low confidence score, it'd then run a 'bowling ball' classifier and also get a low score,...., then run a 'dog' classifier and get a high score.  The subimage tile would then be tagged as having a dog in it.  The algorithm is based on each subimage tile having no more than one object within it.  Highest confidence metric wins.

The rest of this post will focus on quickly setting up YOLO and running it on a series of test images.  Essentially, 3 steps: 1) download and install darknet (open-source neural net), 2) download neural net YOLO weights, 3) run YOLO on a series of images.  Let's get started.

Install Darknet

$ git clone https://github.com/pjreddie/darknet
Cloning into 'darknet'...
remote: Enumerating objects: 5901, done.
remote: Total 5901 (delta 0), reused 0 (delta 0), pack-reused 5901
Receiving objects: 100% (5901/5901), 6.16 MiB | 4.44 MiB/s, done.
Resolving deltas: 100% (3915/3915), done.
Checking connectivity... done.
$ cd darknet; make
...

Download YOLO Weights

$ wget https://pjreddie.com/media/files/yolov3.weights -O darknet/yolov3.weights

Run on Images

$ cd darknet
$ ./darknet detect cfg/yolov3.cfg yolov3.weights ~/Photos/image01.jpg
$ display predictions.jpg

The predictions image will surround detected images with bounding boxes and a label, like this:


Running YOLO on the above photo will result in the output and predictions image; 
/home/lipeltgm/Downloads/nature-cats-dogs_t800.jpg: Predicted in 76.252667 seconds.
dog: 95%
cat: 94%
person: 99%
person: 99%

YOLO found 4 objects, with high confidence for each: 1 cat, 1 dog and two people;

Running on my existing personal photos (~6400 images) and adhoc reviewing the results looks extremely promising; results follow:

Without any pre-processing or prep, I ran the YOLO classifier at my personal archive of photos, some 6400 images of vacations, camping trips, weddings,....  This process took a couple days, launching the darknet detect process individually for each photo, as a result the weights were loaded for each photo that significantly slowed the process, but wasn't really interested in performance as in the detections themselves.

Here is the types of objects found in my photos:
lipeltgm@kaylee:~$ grep "^.*:" ./blog/YOLO/darknet/bigrun.log | grep -v Predic | cut -f 1 -d ':' | sort | uniq -c | sort -n
      1 baseball glove
      1 broccoli
      1 hot dog
      1 kite
      1 scissors
      2 donut
      2 mouse
      2 parking meter
      3 apple
      3 banana
      3 orange
      3 pizza
      3 skateboard
      3 snowboard
      3 zebra
      4 sandwich
      4 toothbrush
      5 train
      6 bus
      6 skis
      6 stop sign
      7 baseball bat
      7 fork
      8 giraffe
      8 toilet
      9 cow
      9 knife
      9 microwave
      9 spoon
      9 surfboard
     10 frisbee
     10 remote
     12 tennis racket
     14 aeroplane
     15 elephant
     16 oven
     18 motorbike
     18 sink
     20 wine glass
     21 vase
     22 fire hydrant
     26 bicycle
     26 sheep
     29 cake
     31 refrigerator
     34 cat
     34 suitcase
     36 teddy bear
     38 sports ball
     40 horse
     43 laptop
     49 cell phone
     54 traffic light
     64 bear
     70 bowl
     77 bed
     77 clock
     88 pottedplant
    103 bird
    106 backpack
    112 handbag
    124 sofa
    124 tvmonitor
    129 umbrella
    156 dog
    170 bottle
    187 book
    194 diningtable
    204 tie
    237 bench
    275 truck
    304 cup
    355 boat
   1029 chair
   1333 car
  14683 person


Gotta say, pretty cool and located a number of random objects I didn't realized I had photos of.  Who knew I had a photo of zebras, but in-fact I really do.  DisneyWorld is amazing:


Have fun with it!!


No comments:

Post a Comment