Creating all the tech projects in Start-Up Kdrama Part 1: Image Recognition AI

8 min readJun 12, 2021

Hi! My name is Joyce and I’m a recent CS grad with a lot of time on her hands and an unhealthy obsession with the kdrama Start-Up (#teamHanJi-pyeong#theGoodestBoy)! Since I’m currently rocking the ~unemployed summer vibe~ I thought it would be fun to try to recreate all the tech projects in the show! This is really just for fun and is by no means the best way to recreate these tech projects, but it is written out in a way that requires no prior CS background and no necessary hardware besides an internet connection and a google account to follow along.

This first article will go through building a very basic image recognition model and then on the next post I’ll take you through building a basic object recognition model based on Episode 2. I will be posting more projects throughout the summer so follow me on twitter if you want to stay updated: @GuoLikeWhoa

Image Recognition App

Early on in the season Nam Do-San shows his parents what he’s been working on with their investment — an image recognition app that can classify objects it sees from a camera feed.(But which we learn quickly has a weakness for misidentifying Do-San’s father as a toilet, more on that later.)

To recreate Do-san’s project I’ll show you how to create a basic image recognition model where your model will classify an image you feed it, and then in the next article I’ll show you how to build an object recognition model, where your model can run over your image and identify objects within it (with the help of Tensorflow and ImageNet). If you follow till the end you’ll be able to spin up a model like Do-san’s based on static images (and perhaps your parents will forgive you for splurging their investment on a nice coffeemaker too).

What you will need:

Colab account (free to obtain): https://colab.research.google.com/

Colab is a free interactive python environment that Google provides, no credit card or payment necessary! It’s essentially a jupyter notebook so it just contains python code. If you would like, you can also copy the code onto a local python script and run it that way if that works easier for you.

Code from this repository: StartUp_ImageRecognition

In this github repo are two important pieces of data: a folder called Data with three pictures in it from the Start-Up show that we will use as our example images, a python notebook titled “Start-Up_ImageRecognition_notebook_part1,” and a python notebook titled “Start-Up_ImageRecognition_notebook_part2.” Follow the readme for instructions on how to load the Data folder to your Google Drive. Then, you can open the “Start-Up_ImageRecognition_part1” notebook and follow along, it contains all the code needed for this tutorial

Lots of snacks to stress eat during debugging

Before we begin I also want to thank the following tutorials which I heavily based my tutorial off of:

ImageNet classification with Python and Keras - PyImageSearch

Normally, I only publish blog posts on Monday, but I'm so excited about this one that it couldn't wait and I decided to…

www.pyimagesearch.com

Turning any CNN image classifier into an object detector with Keras, TensorFlow, and OpenCV …

In this tutorial, you will learn how to take any pre-trained deep learning image classifier and turn it into an object…

www.pyimagesearch.com

Object Detection | TensorFlow Hub

This Colab demonstrates use of a TF-Hub module trained to perform object detection. # For running inference on the…

www.tensorflow.org

Training the Deep Learning Model

The technology behind image recognition most widely used today is Deep Learning, a subset of Machine Learning. Nam Do-San, Kim Yong-San, and Lee Chul-San are all Deep Learning engineers (in addition to being cybersecurity experts and full-stack engineers, lol couldn’t be me…)

Don’t worry if you don’t know much about Deep Learning, you don’t need any specific knowledge on it to get through this tutorial. Just know that Deep Learning uses neural networks modeled after the brain. Each neural network is made up of neurons that receive input data, perform calculations comparing the input data to training data, and then outputting a classification. A single algorithm usually uses many layers of these neural networks, with each layer in charge of identifying a specific structure in the image, and then building up that structure into a larger feature. That’s why it’s called “deep” learning! With the advent of GPU computing power, its become a very powerful technique for image recognition and is used in everything from creating deepfake videos to navigating driverless cars.

The key to creating any good Deep Learning model is a large amount of quality training data and architecture of the model. When Do-San’s model mistakenly classifies his father as a toilet, we can assume this was probably because his training data did not “see” many images of people that look like his father, or that his model was mistakenly picking up some sort of feature on his father that is also present in toilets (perhaps smooth porcelain-like skin?) Or perhaps all the images of toilets that the model was trained with also had people like Do-san’s father sitting on them, causing the model to falsely associate father figures with toilets (happens to the best of us)

Luckily for us we don’t have to worry about this part too much because there is already a model out there, called VGG16, which comes out already trained out-of-the-box from keras and which has been trained on 1.2 million training images, 50,000 validation images, and 150,000 testing images.

The Code

Now to start with, we will mount our notebook onto our google drive by running the below code :

from google.colab import driveimport osdrive.mount(‘/content/drive’)path = “/content/drive/MyDrive/Start_Up_Data”os.chdir( path )

The first two lines import the necessary libraries that we are going to use in this chunk of code.
The third line performs the command to mount your notebook onto your drive, meaning you now can access any files in your drive from your notebook. When you run it, it will take you to a separate window to confirm authorization, then it will give you a code you will paste back into the notebook.
The fourth line is the path to the directory of data that you should have uploaded to your google drive from the github linked at the beginning of the article.
And The fifth line changes directories so that you are now inside your Data directory instead of in your root directory.

If everything has executed correctly, you should be able to now run the following test code and see a picture of Han Ji-pyeong’s car.

import matplotlib.pyplot as pltfrom keras.preprocessing import image as image_utilsfrom skimage.transform import resizeimport numpy as npimg = plt.imread(‘startup_example_scene1.png’)plt.imshow(img)

Now, we have to preprocess the images so that they fit correctly into the model. The model architecture expects images that are 224x224x3 so we need to resize our images.

from PIL import Imagefrom keras.applications import imagenet_utilsimage = []for i in range (1,4):  filename = ‘startup_example_scene’ + str(i) + ‘.png’  img = plt.imread(filename)  resized_img = Image.open(filename).convert(‘RGB’)  resized_img = image_utils.img_to_array(resized_img)  resized_img = resize(resized_img, preserve_range=True, output_shape=(224,224)).astype(int) # reshaping to 224*224*3  image.append(resized_img)X = np.array(image)

In the first two lines, we import in image utility software libraries to help with resizing the images.
Then in the 2–9, there is a for loop that will go through each of the three images, and resize them to be 224x224x3.
In the final line, we convert the image into a numpy array and name the batch of data X.

Now, we are ready to run our images through our model! Run the following lines of code:

from keras.applications.imagenet_utils import decode_predictionsfrom keras.applications.imagenet_utils import preprocess_inputfrom keras.applications.vgg16 import VGG16import numpy as npimport argparseimport cv2from keras.applications.vgg16 import preprocess_inputX = preprocess_input(X)model = VGG16(weights=’imagenet’, include_top=True, input_shape=(224, 224, 3))preds = model.predict(X)P = decode_predictions(preds)

Lines 1–6 once again just import software libraries that we are going to use
Line 7 preprocesses the images one more time, this time using the preprocess_input function that is built specially for the VGG16 model.
Line 8 loads in the VGG16 model and tells it what the input shape will be
Line 9 loads the predictions for our images. There are around 22,000 possible classifications, so this function by default will output the top five classifications by probability.
Line 10 decodes the predictions into human-readable print.

Now let’s see how the model did!

filename = ‘startup_example_scene’ + str(1) + ‘.png’img = plt.imread(filename)plt.imshow(img)count = 1for (ID, label, prob) in P[0]:  print(“{}. {}: {:.2f}%”.format(count, label, prob*100))  count += 1

Line 1 gives the filename of the first photo, the screenshot of Han Ji-pyeong’s car.
Line 2 reads the file into the notebook
Line 3 displays the photos
Line 4–7 are a four loop which displays the top five predicted classifications for the image as well as their probabilities

As you should be able to see, it does just ok. Ideally the image of Han Ji-pyeong’s car should just say car, but instead the top result is racer, followed by police van. (The rest of the two photos have similar results, you can run them in the notebook and see for yourself!)

Results

You might have guessed that the reason it’s so bad at classifying the image as a car is because there’s more than just the car going on in the photo: there’s also a road, bridge, and the faint online of faces in the car. All-in-all with all of the context in the photo, it does seem to look more like a “racer” than just a “car.” How could we improve this? Well we could implement some sort of sliding window on the photo, so that the model focuses on classifying each object within the photo instead of classifying the entire photo. This is called object recognition! (Do-san uses it too, as shown in the neon green boxes that outline each object as it’s being identified.) And indeed it does work with much better accuracy than vanilla image recognition. See the below example:

Using object recognition, the model can identify not only the car with 99% accuracy, but the tires and wheels of the cars too! I will cover how to create this object recognition model and output the above result in the next tutorial so stay tuned!

Congrats!!!! You’ve now finished the whole part1 of the image recognition tutorial and have spun up your first image recognition model with just a few lines of code! You’re on your way to sandbox wooooooo!!!!