cmuis.net

Our goal is to create a simple game called Emoji Hunter that will use our camera to identity real world object that look like emojis. A sample of the gameplay can be seen below:

When we start the Emoji Hunter game, our camera will be activated and the app will use CoreML to determine the dominant object in the frame; if it matches the target emoji within the specified time (10 seconds), then we have success.

Please note that because it uses the camera, this game has to be built on a device -- the simulator is of limited value for running this app.

Begin by creating an app with SwiftUI (doesn't matter which Life Cycle you choose) called EmojiHunter. At this point, we know the drill.
We are going to use a pre-built CoreML model that Apple provides. Go to https://developer.apple.com/machine-learning/models/ and download the MobileNetV2.mlmodel and add it to the 'Models' directory you created for your application. Clicking on the MobileNetV2 model in Xcode, read through the Metadata and Predictions tabs to understand what this CoreML model does and the input to and output from it.
Create a new Swift file called Emoji.swift in your 'Models' directory and add in the following code:

protocol EmojiFoundDelegate {
  func emojiWasFound(result: Bool)
}


enum EmojiSearch {
  case found
  case notFound
  case searching
  case gameOver
}


struct EmojiModel {
  var emoji: String
  var emojiName: String
}


let emojiObjects = []

To the emojiObjects array, add at least 5 emoji models of your choosing. (Pro tip: laptops, pens, and books -- all pretty common for IS students -- are easy to find in a typical room and make good choices.) To add an emoji to an EmojiModel in Xcode, use the keyboard combination of Cmd + Ctrl + Space (command-control-space) and you will see the emoji options pop up in Xcode. Below is a sample:

EmojiModel(emoji: "💻", emojiName: "laptop")

Even though we are using SwiftUI, because we are using a full screen camera view, it will be easier to do this part with a view controller that will return a CameraViewController. To make this happen, we will create a 'ViewControllers' directory and add a file called CustomCameraRepresentable.swift with the following code:

import Foundation
import SwiftUI

struct CustomCameraRepresentable: UIViewControllerRepresentable {
  
    var emojiString: String
    @Binding var emojiFound: EmojiSearch


    func makeUIViewController(context: Context) -> CameraViewController {
      let controller = CameraViewController(emoji: emojiString)
      controller.delegate = context.coordinator
      return controller
    }

    func updateUIViewController(_ cameraViewController: CameraViewController, context: Context) {}

    func makeCoordinator() -> Coordinator {
      Coordinator(emojiFound: $emojiFound)
    }  
}

This code needs a Coordinator, so in the same directory, create Coordinator.swift and add the following:

import Foundation
import SwiftUI

class Coordinator: NSObject, UINavigationControllerDelegate, EmojiFoundDelegate {
  
    @Binding var emojiFound: EmojiSearch
  
    init(emojiFound: Binding<EmojiSearch>) {
      _emojiFound = emojiFound
    }
  
    func emojiWasFound(result: Bool) {
      print("emojiWasFound \(result)")
      emojiFound = .found
    }
}

Now we need to add the view controller (Remember this is a Cocoa Touch Class and subclass of UIViewController) called CameraViewController.swift and add the following:

import UIKit
import AVFoundation
import Vision
import CoreML

We are going to need the audio-visual library, as well as the Vision and CoreML frameworks here. In addition, make this class adopt the AVCaptureVideoDataOutputSampleBufferDelegate, but delete the starter code within the class definition. Replace that with the following:

// MARK: - Setup VC and video settings
var delegate: EmojiFoundDelegate?
  
  var captureSession: AVCaptureSession!
  var previewLayer: AVCaptureVideoPreviewLayer!

  private let videoDataOutput = AVCaptureVideoDataOutput()
  
  var emojiString = ""
  
  convenience init(emoji: String) {
    self.init()
    self.emojiString = emoji
  }
  
  override func viewDidLoad() {
    super.viewDidLoad()

    captureSession = AVCaptureSession()

    guard let videoCaptureDevice = AVCaptureDevice.default(for: .video) else { return }
    let videoInput: AVCaptureDeviceInput

    do {
      videoInput = try AVCaptureDeviceInput(device: videoCaptureDevice)
    } catch { return }

    if (captureSession.canAddInput(videoInput)) {
      captureSession.addInput(videoInput)
    }
    
    self.videoDataOutput.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString) : NSNumber(value: kCVPixelFormatType_32BGRA)] as [String : Any]
    
    self.videoDataOutput.alwaysDiscardsLateVideoFrames = true
    self.videoDataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "camera_frame_processing_queue"))
    self.captureSession.addOutput(self.videoDataOutput)
    
    previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
    previewLayer.frame = self.view.frame
    previewLayer.videoGravity = .resizeAspectFill
    view.layer.addSublayer(previewLayer)

    captureSession.startRunning()
  }
  
  func captureOutput(
    _ output: AVCaptureOutput,
    didOutput sampleBuffer: CMSampleBuffer,
    from connection: AVCaptureConnection) {
    
    guard let frame = CMSampleBufferGetImageBuffer(sampleBuffer) else {
      debugPrint("unable to get image from sample buffer")
      return
    }
    
    self.updateClassifications(in: frame)
  }

  override func viewWillAppear(_ animated: Bool) {
    super.viewWillAppear(animated)

    self.videoDataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "camera_frame_processing_queue"))
    
    if captureSession?.isRunning == false {
      captureSession.startRunning()
    }
  }

  override func viewWillDisappear(_ animated: Bool) {
    super.viewWillDisappear(animated)

    if captureSession?.isRunning == true {
      captureSession.stopRunning()
    }
  }
  
  override var prefersStatusBarHidden: Bool {
    return true
  }

  override var supportedInterfaceOrientations: UIInterfaceOrientationMask {
    return .portrait
  }

This code basically works to set up our camera and continually get data from it so as it moves and gets new images, those images are being processed. Of course, when you paste this code in you get an error, because as we capture this data, we want to use CoreML to classify it, but we have nothing set up for classification yet. Onward Ho!

We have to begin by getting our CoreML model set up with the following:

  // MARK: - Setup ML Model
  lazy var classificationRequest: VNCoreMLRequest = {
    do {
      let model = try VNCoreMLModel(for: MobileNetV2().model)
      
      let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
          self?.processClassifications(for: request, error: error)
      })
      request.imageCropAndScaleOption = .centerCrop
      return request
    } catch {
      fatalError("Failed to load Vision ML model: \(error)")
    }
  }()

Of course, this and the prior code rely on two classification methods we need to add now:

  // MARK: - Classification methods
  func updateClassifications(in image: CVPixelBuffer) {

    DispatchQueue.global(qos: .userInitiated).async {
      let handler = VNImageRequestHandler(cvPixelBuffer: image, orientation: .right, options: [:])
      do {
        try handler.perform([self.classificationRequest])
      } catch {
        print("Failed to perform classification.\n\(error.localizedDescription)")
      }
    }
  }
  
  func processClassifications(for request: VNRequest, error: Error?) {
    DispatchQueue.main.async {
      guard let results = request.results else {
        return
      }
      let classifications = results as! [VNClassificationObservation]
  
      if !classifications.isEmpty {
        if classifications.first!.confidence > 0.5 {
          let identifier = classifications.first?.identifier ?? ""
          // Adding in a print statement so students can see how the classifier is processing images in real time
          print("Classification: Identifier \(identifier) ; Confidence \(classifications.first!.confidence)")
          
          if identifier.contains(self.emojiString){
            self.delegate?.emojiWasFound(result: true)
          }
        }
      }
    }
  }

One thing to note is that in the processClassifications method, I've added a print statement so that you can see how the MobileNet model classifies what it is seeing and the likelihood of a correct classification. When you get to run the app later on the device (plugged into the laptop), you will see output like the following:

What you see here is that as the camera is shaking ever so slightly in my hand (while looking at the pen in the top image), the model is constantly reevaluating based on the new input and getting close, but slightly different confidence values. For our purposes in the game, if the model has confidence above 50%, it's a match. 😉

Now it's time to turn our attention to a view model. Create GameViewModel.swift and create the basic GameViewModel class that conforms to the ObservableObject protocol. Also to sure to import SwiftUI as we will be returning some Swift UI view components. Within the class, set up the following basic properties:

  // MARK: - Properties setup
  @Published var currentLevel = 0
  @Published var showNext = false
  
  // Set up timer for the game
  @Published var timer = Timer.publish (every: 1, on: .main, in: .common).autoconnect()
  @Published var timeRemaining = 10
  
  // Set initial emoji status to searching
  @Published var emojiStatus = EmojiSearch.searching

Our game is time-dependent, so we need to have some methods which handle timing matters:

  // MARK: - Timer methods
  func instantiateTimer() {
    self.timer = Timer.publish(every: 1, on: .main, in: .common).autoconnect()
  }

  func cancelTimer() {
    self.timer.upstream.connect().cancel()
  }
  
  func adjustTime() {
    if self.emojiStatus == .found {
      self.cancelTimer()
      self.timeRemaining = 10
      
    }
    else {
      if self.timeRemaining > 0 {
        self.timeRemaining -= 1
      }
      else{
        self.emojiStatus = .notFound
        self.showNext = true
      }
    }
  }

And finally, we need some game play methods to handle the logic of playing the basic game:

  // MARK: - Gameplay methods
  func startSearch() {
    if self.currentLevel == emojiObjects.count - 1 {
      self.emojiStatus = .gameOver
    }
    else {
      self.currentLevel = self.currentLevel + 1
      startRound()
    }
  }
  
  func restartGame() {
    self.currentLevel = 0
    startRound()
  }
  
  func startRound() {
    self.timeRemaining = 10
    self.emojiStatus = .searching
    self.showNext = false
    self.instantiateTimer()
  }
  
  func emojiResultText() -> Text {
   switch emojiStatus {
   case .found:
    return Text("\(emojiObjects[currentLevel].emoji) is FOUND")
      .font(.system(size:50, design: .rounded))
      .fontWeight(.bold)
   case .notFound:
      return Text("\(emojiObjects[currentLevel].emoji) NOT FOUND")
      .font(.system(size:50, design: .rounded))
      .foregroundColor(.red)
      .fontWeight(.bold)
   default:
      return Text(emojiObjects[currentLevel].emoji)
      .font(.system(size:50, design: .rounded))
      .fontWeight(.bold)
    }
  }

To make sure that everything is good, let's run our app on our phone now. It will only print out "Hello World" on the screen (we haven't done anything to our default ContentView), but it's good to see that everything builds and know that we've followed all the instructions to this point. Be sure to read through this view model and understand what it provides -- you will need all these things in the next step. 😏

Now this is where you earn your salt (or more precisely, the first set of points for the second half of the exam); everything so far has been read, copy, and paste, but now we need to pull it together in the SwiftUI view. The amount of instruction is going down, but if you understand what we've done to this point, you will be fine and I will provide some hints and guidance. Don't forget that you have screen shots of the app in action at the top of this lab that can also help you.
In the body of ContentView.swift, we want to add a ZStack that will have on the first level either (A) a blank screen with just a green button saying "NEXT" if either the view model's showNext value is true (because you didn't find the object) or the view model's emojiStatus is .found (because you did find it) AND the status is not .gameOver or (B) the camera view set up and running. To do the latter, use the following:

CustomCameraRepresentable(emojiString: emojiObjects[viewModel.currentLevel].emojiName, emojiFound: $viewModel.emojiStatus)

(I said limited code here, not no code, and this is hard to describe and time consuming for you to research, so I'm giving you this line. But ask your this question, "why is there a $ here? What does the $ denote in this case?")

On the second part of the ZStack we have some messages on the bottom of the screen. If the view model's emoji status is .gameOver then we want a red button saying "GAME OVER\nTap to restart" that will restart the whole game. But if the status is .searching, we want to display the time remaining. We want to make the text of the timer large and bright and easy to see, but it also needs to update regularly, so there is one more property we need to add to this time remaining text:

   .onReceive(viewModel.timer) { _ in
      viewModel.adjustTime()
   }

Of course, at the bottom of every screen is either the emoji you are hunting for or a message that the emoji was found or not found. (Hint: inside this ZStack, we probably want a VStack to handle this second level of the ZStack.)

We are almost ready to build, but if we did it now, the app would crash because when we started the camera, we violated Apple security policy. To deal with that, we have to go to Info.plist and add the highlighted line:

With that done, we can build again and play our game. 👊

If you have gotten this far, you've earned 21 points (which is just above the average for the oral exam). To get the last few points (if you want them), you can do the following (mix and match with a maximum of 25 points):

Launch Screen: you noticed that it takes a moment or two for the camera view set up and load, so customize the launch screen so that it has the app name in large, bold letters and your name and a one sentence description of the app below that. (1 point)
Score: right now we don't keep score, but having a score (both the total emojis found and percent found, rounded to no decimal places) updated and displayed for the user is worth 2 more points. (2 points)
Randomization: so it's less predictable, add 5 more emoji (10 minimum total) and randomly select the emoji that will hunted. (1 point)
Reset: like the famous slider game, we want to loop through those emojis until the user hits a reset button and resets the scores. You need to add another 5 emojis as well. (1 point, but presumes that the score functionality is in place)

Qapla'

Lab 10: Emoji Hunter