Hi There! Today we will be exploring object recognition, as seen in episode 2 of Start-Up!

In the previous article, we explored image recognition (https://guolikewhoa.medium.com/creating-all-the-tech-projects-in-start-up-kdrama-part-1-image-recognition-ai-814bc38b9111) object recognition uses similar technology, but instead of focusing on classifying the entire image, object recognition focuses on classifying specific objects within an image. This task is usually achieved using an R-CNN deep learning classification model. R-CNN stands for Region Based Convolutional Neural Network, and you can think of it as a different flavor of neural network than the VGG16 model we used in the last article. R-CNN’s architecture is designed to draw boundary boxes around objects, extract and identify features from within those boxes, and then finally using those features to classify those boxes. R-CNNs are even used to power Google Lens!

The R-CNN model we will be using today comes out of the box from TensorFlowHub, and is trained on the same ImageNet dataset as the VGG16 model that we used in the last article.

Intro

A bit of background if you’re new to the blog — I’m a recent CS grad with a lot of time on her hands and an unhealthy obsession with the kdrama Start-Up (#teamHanJi-pyeong#theGoodestBoy)! Since I’m currently rocking the ~unemployed summer vibe~ I thought it would be fun to try to recreate all the tech projects in the show! This is really just for fun and is written out in a way that requires no prior CS background and no necessary hardware besides an internet connection and a google account to follow along.

I will be posting more projects throughout the summer so follow me on twitter if you want to stay updated: @GuoLikeWhoa

My next article I will be attempting to create the font generator from episode 5 of startup!

What you will need:

  1. Colab account (free to obtain): https://colab.research.google.com/

Colab is a free interactive python environment that Google provides, no credit card or payment necessary! It’s essentially a jupyter notebook so it just contains python code. If you would like, you can also copy the code onto a local python script and run it that way if that works easier for you.

  1. Code from this repository: StartUp_ImageRecognition

In this github repo are two important pieces of data: a folder called Data with three pictures in it from the Start-Up show that we will use as our example images, a python notebook titled “Start-Up_ImageRecognition_notebook_part1,” and a python notebook titled “Start-Up_ImageRecognition_notebook_part2.” Follow the readme for instructions on how to load the Data folder to your Google Drive. Then, you can open the “Start-Up_ImageRecognition_part2” notebook and follow along, it contains all the code needed for this tutorial

Before we begin I also want to thank the following tutorial from TensorFlow which I use in this article:

https://www.tensorflow.org/hub/tutorials/object_detection

The example object detection code is the same code I use in this article, just with a few minor tweaks to take in our Start-up screenshot images!

The Code

Now to start with, we will mount our notebook onto our google drive by running the below code :

from google.colab import drive
import os
drive.mount(‘/content/drive’)
path = “/content/drive/MyDrive/Start_Up_part_2_image_recognition”
os.chdir( path )

The first two lines import the necessary libraries that we are going to use in this chunk of code.

The third line performs the command to mount your notebook onto your drive, meaning you now can access any files in your drive from your notebook. When you run it, it will take you to a separate window to confirm authorization, then it will give you a code you will paste back into the notebook.

The fourth line is the path to the directory of data that you should have uploaded to your google drive from the github linked at the beginning of the article.

And The fifth line changes directories so that you are now inside your Data directory instead of in your root directory.

Now, we need to make sure our colab notebook is mounted onto a GPU instance. GPU stands for graphic processing unit and is like a more powerful CPU that can handle the complicated computations needed for deep learning networks. Luckily Google provides free GPU resources for us! They can be accessed by running the following: Runtime > Change runtime type > Hardware accelerator: GPU (make sure it says GPU here, if it doesn’t change it)> Save

Now, we will load all the necessary libraries for the tutorial:

import tensorflow as tfimport tensorflow_hub as hubimport matplotlib.pyplot as pltimport tempfilefrom six.moves.urllib.request import urlopenfrom six import BytesIOimport numpy as npfrom PIL import Imagefrom PIL import ImageColorfrom PIL import ImageDrawfrom PIL import ImageFontfrom PIL import ImageOpsimport timeprint(tf.__version__)print(“The following GPU devices are available: %s” % tf.test.gpu_device_name())

TensorFlow libraries provide us with apis and functions to create neural networks

TensorFlow Hub is a repository with already trained deep learning models

Matplotlib will allow us to display our images before and after they have been classified

PIL is an imaging library that we will use to process our images

And the rest of the functions perform smaller routine tasks like allowing us to save files, timing how long our model takes to classify our images, and converting images into an array.

Now, the helper functions are defined:

def display_image(image):  fig = plt.figure(figsize=(20, 15))  plt.grid(False)  plt.imshow(image)

display_image will use matplotlib to display out our image (in our case screenshot scenes from Start-Up)

def open_and_resize_image(filename_image, new_width=256, new_height=256, display=False):  _, filename = tempfile.mkstemp(suffix=”.jpg”)  pil_image = Image.open(filename_image)  pil_image = ImageOps.fit(pil_image, (new_width, new_height), Image.ANTIALIAS)  pil_image_rgb = pil_image.convert(“RGB”)  pil_image_rgb.save(filename, format=”JPEG”, quality=90)  print(“Image downloaded to %s.” % filename)  if display:    display_image(pil_image)  return filename

open_and_resize_image is a function which takes in the filename, and resizes it so it fits the proper dimensions that the R-CNN neural network will later expect.

def draw_bounding_box_on_image(image, ymin, xmin, ymax, xmax, color,font, thickness=4, display_str_list=()):

draw = ImageDraw.Draw(image)
im_width, im_height = image.size (left, right, top, bottom) = (xmin * im_width, xmax * im_width,ymin * im_height, ymax * im_height) draw.line([(left, top), (left, bottom), (right, bottom), (right, top), (left, top)], width=thickness, fill=color) display_str_heights = [font.getsize(ds)[1] for ds in display_str_list] total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights) if top > total_display_str_height: text_bottom = top else: text_bottom = top + total_display_str_height for display_str in display_str_list[::-1]: text_width, text_height = font.getsize(display_str) margin = np.ceil(0.05 * text_height) draw.rectangle([(left, text_bottom — text_height — 2 * margin),(left + text_width, text_bottom)], fill=color) draw.text((left + margin, text_bottom — text_height — margin),display_str, fill=”black”, font=font) text_bottom -= text_height — 2 * margindef draw_boxes(image, boxes, class_names, scores, max_boxes=10, min_score=0.1): colors = list(ImageColor.colormap.values()) try: font = ImageFont.truetype(“/usr/share/fonts/truetype/liberation/LiberationSansNarrow-Regular.ttf”, 25) except IOError: print(“Font not found, using default font.”) font = ImageFont.load_default() for i in range(min(boxes.shape[0], max_boxes)): if scores[i] >= min_score: ymin, xmin, ymax, xmax = tuple(boxes[i]) display_str = “{}: {}%”.format(class_names[i].decode(“ascii”), int(100 * scores[i])) color = colors[hash(class_names[i]) % len(colors)] image_pil = Image.fromarray(np.uint8(image)).convert(“RGB”) draw_bounding_box_on_image(image_pil, ymin, xmin, ymax, xmax, color, font, display_str_list=[display_str]) np.copyto(image, np.array(image_pil)) return image

draw_boxes and draw_bounding_box_on_image will draw the boundaries that the R-CNN detects as objects and display them

Now, we download the image

resized_img = open_and_resize_image(‘/content/drive/MyDrive/Start_Up_part_1_image_recognition/startup_example_scene1.png’, 1280, 856, True)

When this code block runs, you should see Han Ji-pyeong’s car!

module_handle = “https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1"detector = hub.load(module_handle).signatures[‘default’]

Here, we download the R-CNN model from TensorFlow Hub called inception resnet and load it into the variable detector

def load_img(path):  img = tf.io.read_file(path)  img = tf.image.decode_jpeg(img, channels=3)  return img

In this code block we load the image from the temporary file we created previously so we can feed it into the detector

def run_detector(detector, path):  img = load_img(path)  converted_img = tf.image.convert_image_dtype(img, tf.float32)[tf.newaxis, …]  start_time = time.time()  result = detector(converted_img)  end_time = time.time()  result = {key:value.numpy() for key,value in result.items()}  print(“Found %d objects.” % len(result[“detection_scores”]))  print(“Inference time: “, end_time-start_time)  image_with_boxes = draw_boxes(  img.numpy(), result[“detection_boxes”],  result[“detection_class_entities”], result[“detection_scores”])  display_image(image_with_boxes)

Finally, this code will feed the image into the detector, then display it with the drawn boundary boxes, classification, and the probability of classification.

Now we run it on our image:

run_detector(detector, resized_img)

You should see the car scene we displayed previously, but now with all the object detections displayed.

Now you can run the remainder code cells and see the classifications on the remaining two images!

--

--

Joyce

Hi there! I’m a CS and Journalism student who watches a lot of tv in her free time—follow me to read about recreating cool tech projects from tv shows!