Learn how to program
by playing video games.

OpenCV Object Detection in Games

How To Build a Bot with OpenCV

September 12, 2020
Part of: OpenCV Object Detection in Games

Learn how to combine OpenCV object detection with PyAutoGUI and Threading to build a custom Python video game bot. Using the OpenCV image recognition techniques discussed earlier in this tutorial, we can now use that data to perform mouse clicks automatically. We'll use Python threads so that our bot actions can happen independently of our object detection.

Links
Grab the code on GitHub: https://github.com/learncodebygaming/opencv_tutorials
PyAutoGUI: https://pyautogui.readthedocs.io/en/latest/
PyDirectInput: https://github.com/learncodebygaming/pydirectinput

Building a bot with OpenCV is a matter of combining the object detection techniques that we've discussed (in the last 8 parts of this tutorial series) with some GUI automation. Now that you know how to find an object using OpenCV, you just need to use something like PyAutoGUI or PyDirectInput to click on the objects you find. But there are a few architectural decisions that can make it kind of difficult to know where to get started. So I'll walk you through one way to approach this that should be pretty flexible and act as a good foundation for building a more capable bot.

The most straightforward way to add automated mouse clicks to our bot script is to do it inline. By that I mean we can take a screenshot, process it to find the objects we want, and then perform any actions we want our bot to take. So we can do that sequentially, and just loop back around when our bot is done with its actions. So let me show you what that would look like, and then I'll point out some of the limitations with that method.

# main.py
from time import time, sleep
import pyautogui
# ...

while(True):

    # get an updated image of the game
    screenshot = wincap.get_screenshot()

    # do object detection
    rectangles = cascade.detectMultiScale(screenshot)

    # draw the detection results onto the original image
    detection_image = vision.draw_rectangles(screenshot, rectangles)

    # display the images
    cv.imshow('Matches', detection_image)

    # take bot actions
    if len(rectangles) > 0:
        # just grab the first objects detection in the list and find the place 
        # to click
        targets = vision.get_click_points(rectangles)
        target = wincap.get_screen_position(targets[0])
        pyautogui.moveTo(x=target[0], y=target[1])
        pyautogui.click()
        # wait 5 seconds for the mining to complete
        sleep(5)

    # press 'q' with the output window focused to exit.
    # waits 1 ms every loop to process key presses
    key = cv.waitKey(1)
    if key == ord('q'):
        cv.destroyAllWindows()
        break

print('Done.')

In this code, once we have the detected objects in the form of a list of rectangles, we first convert those into click positions using the get_click_points() and get_screen_position() functions we wrote previously. Then it's taking the very first screen position in that list, moves the mouse to that position using pyautogui.moveTo(), and then we use PyAutoGUI again to do the mouse click. Hopefully this has resulted in clicking on our target (a limestone deposit in Albion Online), and we pause our script for 5 seconds using sleep() to give our character time to do that mining.

So one of the problems this code has is it's going to take at least 5 seconds between each update of our debug output. That isn't great for seeing how well our object detection is working, compared to the continuous video stream we had before. But writing a bot this way will still work, and it keeps things simple if you don't mind doing the object detection only when you need it.

If you want to get more advanced, you can fix this problem by using threading.

Threading, or multithreading, is the ability that all modern computers have to run multiple processes or threads in parallel at the same time. This is what allows you to have multiple programs running at once. Right now our bot script runs as just a single thread, but by using the Python threading library we can make it branch out and do multiple things at once (run in multiple threads).

In our case, what it'd be great to do, is to have our main script running in the main thread, and then branch off our bot actions into a separate thread. That way the bot can go off and do the things it needs to do without blocking the execution of our main thread where the object detection is happening.

# main.py
from threading import Thread
# ...

# this global variable is used to notify the main loop of when the bot
# actions have completed
is_bot_in_action = False

# this function will be performed inside another thread
def bot_actions(rectangles):
    if len(rectangles) > 0:
        # just grab the first objects detection in the list and find the place to click
        targets = vision.get_click_points(rectangles)
        target = wincap.get_screen_position(targets[0])
        pyautogui.moveTo(x=target[0], y=target[1])
        pyautogui.click()
        # wait 5 seconds for the mining to complete
        sleep(5)

    # let the main loop know when this process is completed
    global is_bot_in_action
    is_bot_in_action = False


while(True):

    # get an updated image of the game
    screenshot = wincap.get_screenshot()

    # do object detection
    rectangles = cascade.detectMultiScale(screenshot)

    # draw the detection results onto the original image
    detection_image = vision.draw_rectangles(screenshot, rectangles)

    # display the images
    cv.imshow('Matches', detection_image)

    # take bot actions
    if not is_bot_in_action:
        is_bot_in_action = True
        # run the function in a thread that's separate from the main thread
        # so that the code here can continue while the bot performs its actions
        t = Thread(target=bot_actions, args=(rectangles,))
        t.start()

    # press 'q' with the output window focused to exit.
    # waits 1 ms every loop to process key presses
    key = cv.waitKey(1)
    if key == ord('q'):
        cv.destroyAllWindows()
        break

print('Done.')

In the main loop we call Thread() to create a new thread object in our program. We tell it what code to execute by giving it a function name as the target, and then passing in the parameters needed by the function as a tuple. We can then run that function in a separate thread by calling start() on that thread objects.

In this example, I'm using a global variable to make sure only one bot action thread is running at a time. This is a simple way to pass information between threads. This can be fine for smaller scripts, but using global variables is generally discouraged and can make code difficult to maintain. So let me show you a better way to do this without using global variables.

Let's put everything into threads. So we'll have one thread for capturing our screenshots, another for doing the object detection, another for the bot actions, and then our main thread coordinating all this.

We'll start with the object detection. Let's pull this out of the main script and put it into its own class.

# detection.py
import cv2 as cv
from threading import Thread, Lock


class Detection:

    # threading properties
    stopped = True
    lock = None
    rectangles = []
    # properties
    cascade = None
    screenshot = None

    def __init__(self, model_file_path):
        # create a thread lock object
        self.lock = Lock()
        # load the trained model
        self.cascade = cv.CascadeClassifier(model_file_path)

    def update(self, screenshot):
        self.lock.acquire()
        self.screenshot = screenshot
        self.lock.release()

    def start(self):
        self.stopped = False
        t = Thread(target=self.run)
        t.start()

    def stop(self):
        self.stopped = True

    def run(self):
        while not self.stopped:
            if not self.screenshot is None:
                # do object detection
                rectangles = self.cascade.detectMultiScale(self.screenshot)
                # lock the thread while updating the results
                self.lock.acquire()
                self.rectangles = rectangles
                self.lock.release()

In the constructor we initialize the cascade classifier, and we also create a lock object (which I'll talk more about in a minute). The classifier model will get new screenshots to process via the update() method. There's a method to start the thread, and another to stop the thread. The function that will be running in the thread is called run(), which runs the detection code in an infinite loop until the stopped property has changed. The results from the object detection are saved to the rectangles property.

So when our bot thread wants an updated list of object detections, it's going to grab that from the rectangles property on the detection object that we will create in main.py. That can be a problem, because remember these will both be running in separate threads, and if that bot thread tries to get a list of rectangles while the detection thread is in the middle of updating that list: that's going to cause a conflict and break your program. So to prevent those sorts of issues, we use the lock object that we created in the constructor to lock this thread during property updates. We do that with the acquire() and release() calls. Once a thread has acquired a lock no other thread can acquire the same one, and the code will block until that lock as been released.

Now that you've seen how this class is written, let me show you how to use it.

# main.py
from detection import Detection
# ...

# initialize the WindowCapture class
wincap = WindowCapture('Albion Online Client')
# load the detector
detector = Detection('limestone_model_final.xml')
# load an empty Vision class
vision = Vision()

detector.start()

while(True):

    # get an updated image of the game
    screenshot = wincap.get_screenshot()

    # give detector the current screenshot to search for objects in
    detector.update(screenshot)

    # draw the detection results onto the original image
    detection_image = vision.draw_rectangles(screenshot, detector.rectangles)

    # display the images
    cv.imshow('Matches', detection_image)

    # press 'q' with the output window focused to exit.
    # waits 1 ms every loop to process key presses
    key = cv.waitKey(1)
    if key == ord('q'):
        detector.stop()
        cv.destroyAllWindows()
        break

In this code we simply initialize a detector object, we start the thread, and then each time through the main loop we refresh the screenshot it's working with. And when we quit our program we also must stop the detector thread. This script should work just like before, except now all of our object detection is happening inside a separate thread.

Now let's do the same thing to add threading to our WindowCapture class, and a new class for our bot actions. I won't show all that code here, because it's fairly redundant, but if you need clarification you can find it on GitHub.

Finally we can focus now on finishing our bot code and mining some limestone deposits! Let's start with the main script. Our bot won't always need the latest object detections or screenshot, so how about we only send over that information when our bot needs it by keeping track of what state/mode our bot is in.

# main.py
import cv2 as cv
import numpy as np
import os
from time import time
from windowcapture import WindowCapture
from detection import Detection
from vision import Vision
from bot import AlbionBot, BotState


DEBUG = True

# initialize the WindowCapture class
wincap = WindowCapture('Albion Online Client')
# load the detector
detector = Detection('limestone_model_final.xml')
# load an empty Vision class
vision = Vision()
# initialize the bot
bot = AlbionBot((wincap.offset_x, wincap.offset_y), (wincap.w, wincap.h))

wincap.start()
detector.start()
bot.start()

while(True):

    # if we don't have a screenshot yet, don't run the code below this point yet
    if wincap.screenshot is None:
        continue

    # give detector the current screenshot to search for objects in
    detector.update(wincap.screenshot)

    # update the bot with the data it needs right now
    if bot.state == BotState.INITIALIZING:
        # while bot is waiting to start, go ahead and start giving it some targets to work
        # on right away when it does start
        targets = vision.get_click_points(detector.rectangles)
        bot.update_targets(targets)
    elif bot.state == BotState.SEARCHING:
        # when searching for something to click on next, the bot needs to know what the click
        # points are for the current detection results. it also needs an updated screenshot
        # to verify the hover tooltip once it has moved the mouse to that position
        targets = vision.get_click_points(detector.rectangles)
        bot.update_targets(targets)
        bot.update_screenshot(wincap.screenshot)
    elif bot.state == BotState.MOVING:
        # when moving, we need fresh screenshots to determine when we've stopped moving
        bot.update_screenshot(wincap.screenshot)
    elif bot.state == BotState.MINING:
        # nothing is needed while we wait for the mining to finish
        pass

    if DEBUG:
        # draw the detection results onto the original image
        detection_image = vision.draw_rectangles(wincap.screenshot, detector.rectangles)
        # display the images
        cv.imshow('Matches', detection_image)

    # press 'q' with the output window focused to exit.
    # waits 1 ms every loop to process key presses
    key = cv.waitKey(1)
    if key == ord('q'):
        wincap.stop()
        detector.stop()
        bot.stop()
        cv.destroyAllWindows()
        break

print('Done.')

We'll use a data structure called an enum to signify what state our bot is in, rather than using an ambiguous integer value, or a string value that we might spell wrong.

# bot.py
class BotState:
    INITIALIZING = 0
    SEARCHING = 1
    MOVING = 2
    MINING = 3
    BACKTRACKING = 4

For the AlbionBot class itself, let me give it all to you now and then I'll point out the interesting bits.

# bot.py
import cv2 as cv
import pyautogui
from time import sleep, time
from threading import Thread, Lock
from math import sqrt


class AlbionBot:
    
    # constants
    INITIALIZING_SECONDS = 6
    MINING_SECONDS = 14
    MOVEMENT_STOPPED_THRESHOLD = 0.975
    IGNORE_RADIUS = 130
    TOOLTIP_MATCH_THRESHOLD = 0.72

    # threading properties
    stopped = True
    lock = None

    # properties
    state = None
    targets = []
    screenshot = None
    timestamp = None
    movement_screenshot = None
    window_offset = (0,0)
    window_w = 0
    window_h = 0
    limestone_tooltip = None
    click_history = []

    def __init__(self, window_offset, window_size):
        # create a thread lock object
        self.lock = Lock()

        # for translating window positions into screen positions, it's easier to just
        # get the offsets and window size from WindowCapture rather than passing in 
        # the whole object
        self.window_offset = window_offset
        self.window_w = window_size[0]
        self.window_h = window_size[1]

        # pre-load the needle image used to confirm our object detection
        self.limestone_tooltip = cv.imread('limestone_tooltip.jpg', cv.IMREAD_UNCHANGED)

        # start bot in the initializing mode to allow us time to get setup.
        # mark the time at which this started so we know when to complete it
        self.state = BotState.INITIALIZING
        self.timestamp = time()

    def click_next_target(self):
        # 1. order targets by distance from center
        # loop:
        #   2. hover over the nearest target
        #   3. confirm that it's limestone via the tooltip
        #   4. if it's not, check the next target
        # endloop
        # 5. if no target was found return false
        # 6. click on the found target and return true
        targets = self.targets_ordered_by_distance(self.targets)

        target_i = 0
        found_limestone = False
        while not found_limestone and target_i < len(targets):
            # if we stopped our script, exit this loop
            if self.stopped:
                break

            # load up the next target in the list and convert those coordinates
            # that are relative to the game screenshot to a position on our
            # screen
            target_pos = targets[target_i]
            screen_x, screen_y = self.get_screen_position(target_pos)
            print('Moving mouse to x:{} y:{}'.format(screen_x, screen_y))

            # move the mouse
            pyautogui.moveTo(x=screen_x, y=screen_y)
            # short pause to let the mouse movement complete and allow
            # time for the tooltip to appear
            sleep(1.250)
            # confirm limestone tooltip
            if self.confirm_tooltip(target_pos):
                print('Click on confirmed target at x:{} y:{}'.format(screen_x, screen_y))
                found_limestone = True
                pyautogui.click()
                # save this position to the click history
                self.click_history.append(target_pos)
            target_i += 1

        return found_limestone

    def have_stopped_moving(self):
        # if we haven't stored a screenshot to compare to, do that first
        if self.movement_screenshot is None:
            self.movement_screenshot = self.screenshot.copy()
            return False

        # compare the old screenshot to the new screenshot
        result = cv.matchTemplate(self.screenshot, self.movement_screenshot, cv.TM_CCOEFF_NORMED)
        # we only care about the value when the two screenshots are laid perfectly over one 
        # another, so the needle position is (0, 0). since both images are the same size, this
        # should be the only result that exists anyway
        similarity = result[0][0]
        print('Movement detection similarity: {}'.format(similarity))

        if similarity >= self.MOVEMENT_STOPPED_THRESHOLD:
            # pictures look similar, so we've probably stopped moving
            print('Movement detected stop')
            return True

        # looks like we're still moving.
        # use this new screenshot to compare to the next one
        self.movement_screenshot = self.screenshot.copy()
        return False

    def targets_ordered_by_distance(self, targets):
        # our character is always in the center of the screen
        my_pos = (self.window_w / 2, self.window_h / 2)
        # searched "python order points by distance from point"
        # simply uses the pythagorean theorem
        # https://stackoverflow.com/a/30636138/4655368
        def pythagorean_distance(pos):
            return sqrt((pos[0] - my_pos[0])**2 + (pos[1] - my_pos[1])**2)
        targets.sort(key=pythagorean_distance)

        # print(my_pos)
        # print(targets)
        # for t in targets:
        #    print(pythagorean_distance(t))

        # ignore targets at are too close to our character (within 130 pixels) to avoid 
        # re-clicking a deposit we just mined
        targets = [t for t in targets if pythagorean_distance(t) > self.IGNORE_RADIUS]

        return targets

    def confirm_tooltip(self, target_position):
        # check the current screenshot for the limestone tooltip using match template
        result = cv.matchTemplate(self.screenshot, self.limestone_tooltip, cv.TM_CCOEFF_NORMED)
        # get the best match postition
        min_val, max_val, min_loc, max_loc = cv.minMaxLoc(result)
        # if we can closely match the tooltip image, consider the object found
        if max_val >= self.TOOLTIP_MATCH_THRESHOLD:
            # print('Tooltip found in image at {}'.format(max_loc))
            # screen_loc = self.get_screen_position(max_loc)
            # print('Found on screen at {}'.format(screen_loc))
            # mouse_position = pyautogui.position()
            # print('Mouse on screen at {}'.format(mouse_position))
            # offset = (mouse_position[0] - screen_loc[0], mouse_position[1] - screen_loc[1])
            # print('Offset calculated as x: {} y: {}'.format(offset[0], offset[1]))
            # the offset I always got was Offset calculated as x: -22 y: -29
            return True
        #print('Tooltip not found.')
        return False

    def click_backtrack(self):
        # pop the top item off the clicked points stack. this will be the click that
        # brought us to our current location.
        last_click = self.click_history.pop()
        # to undo this click, we must mirror it across the center point. so if our
        # character is at the middle of the screen at ex. (100, 100), and our last
        # click was at (120, 120), then to undo this we must now click at (80, 80).
        # our character is always in the center of the screen
        my_pos = (self.window_w / 2, self.window_h / 2)
        mirrored_click_x = my_pos[0] - (last_click[0] - my_pos[0])
        mirrored_click_y = my_pos[1] - (last_click[1] - my_pos[1])
        # convert this screenshot position to a screen position
        screen_x, screen_y = self.get_screen_position((mirrored_click_x, mirrored_click_y))
        print('Backtracking to x:{} y:{}'.format(screen_x, screen_y))
        pyautogui.moveTo(x=screen_x, y=screen_y)
        # short pause to let the mouse movement complete
        sleep(0.500)
        pyautogui.click()

    # translate a pixel position on a screenshot image to a pixel position on the screen.
    # pos = (x, y)
    # WARNING: if you move the window being captured after execution is started, this will
    # return incorrect coordinates, because the window position is only calculated in
    # the WindowCapture __init__ constructor.
    def get_screen_position(self, pos):
        return (pos[0] + self.window_offset[0], pos[1] + self.window_offset[1])

    # threading methods

    def update_targets(self, targets):
        self.lock.acquire()
        self.targets = targets
        self.lock.release()

    def update_screenshot(self, screenshot):
        self.lock.acquire()
        self.screenshot = screenshot
        self.lock.release()

    def start(self):
        self.stopped = False
        t = Thread(target=self.run)
        t.start()

    def stop(self):
        self.stopped = True

    # main logic controller
    def run(self):
        while not self.stopped:
            if self.state == BotState.INITIALIZING:
                # do no bot actions until the startup waiting period is complete
                if time() > self.timestamp + self.INITIALIZING_SECONDS:
                    # start searching when the waiting period is over
                    self.lock.acquire()
                    self.state = BotState.SEARCHING
                    self.lock.release()

            elif self.state == BotState.SEARCHING:
                # check the given click point targets, confirm a limestone deposit,
                # then click it.
                success = self.click_next_target()
                # if not successful, try one more time
                if not success:
                    success = self.click_next_target()

                # if successful, switch state to moving
                # if not, backtrack or hold the current position
                if success:
                    self.lock.acquire()
                    self.state = BotState.MOVING
                    self.lock.release()
                elif len(self.click_history) > 0:
                    self.click_backtrack()
                    self.lock.acquire()
                    self.state = BotState.BACKTRACKING
                    self.lock.release()
                else:
                    # stay in place and keep searching
                    pass

            elif self.state == BotState.MOVING or self.state == BotState.BACKTRACKING:
                # see if we've stopped moving yet by comparing the current pixel mesh
                # to the previously observed mesh
                if not self.have_stopped_moving():
                    # wait a short time to allow for the character position to change
                    sleep(0.500)
                else:
                    # reset the timestamp marker to the current time. switch state
                    # to mining if we clicked on a deposit, or search again if we
                    # backtracked
                    self.lock.acquire()
                    if self.state == BotState.MOVING:
                        self.timestamp = time()
                        self.state = BotState.MINING
                    elif self.state == BotState.BACKTRACKING:
                        self.state = BotState.SEARCHING
                    self.lock.release()
                
            elif self.state == BotState.MINING:
                # see if we're done mining. just wait some amount of time
                if time() > self.timestamp + self.MINING_SECONDS:
                    # return to the searching state
                    self.lock.acquire()
                    self.state = BotState.SEARCHING
                    self.lock.release()

Checkout the run() method first, as this acts as the main controller for the bot. Depending on what state the bot is in, the controller will take the appropriate actions and it is also in charge of switching our bot to a new state.

In the BotState.INITIALIZING state we are simply giving ourselves 6 seconds to get ready before our bot begins. The BotState.MINING state is equally as simple, as here we just wait 14 seconds for the mining to complete.

The BotState.MOVING is interesting, because we don't know exactly how long it will take our character to get into a new position following a click. To determine this I've written have_stopped_moving(), which uses matchTemplate() to compare the previous screenshot to the current one. If the two images are sufficiently similar, then we make the determination that our character has stopped moving.

The BotState.SEARCHING state is where most of the action happens, because this is where we find our next object to click on. In click_next_target() I first take the object detection list and sort it by how far away each object is from our character. I want to start with the nearest objects, check to see if those are valid limestone deposits, before I check objects that are farther away. But I don't want to check objects that are right on top of my character, because I found that those are usually the same deposit that I just finished mining. So I remove those very close objects from the list (which may or may not be a good idea for your own bot).

To determine if an object detection is actually a limestone deposit, I'm relying on that fact that in Albion Online you get a tooltip popup when you hover over certain objects. So I wrote confirm_tooltip(), which again uses matchTemplate() to decide if the object the mouse is currently hovering over is a good detection or not. Only when that tooltip is found is a mouse click actually performed.

Finally, sometimes my character can end up in a place where no limestone deposits can be found on the screen. To deal with this situation I created the BotState.BACKTRACKING state and wrote click_backtrack(). Whenever the bot makes a mouse click I save that click position to a list, so that I have a history of all clicks that have been made. When the bot can find no more valid targets, we can then reverse those clicks, rewinding our character back into a position where it should be able to find deposits again.

In click_backtrack() we are treating click_history as a data structure known as a "stack". The last item added to the list, using append(), is the first item we remove from it, using pop(). And to undo a click we can't simply click on the same screen position again. We must mirror it across our character's position. So if our original click was to the character's upper right by X and Y amount, to undo that click we must click to the character's lower left by the same X and Y amount.

There are obviously still a ton of things we could do to make this bot better, but this should be enough to get you started on building your own bot. From here it's a process of testing, finding issues, and correcting them... and just doing that over and over again until you've got a refined bot that you're happy with.

If you've been following along and writing code, now's the time to set out on your own path and just explore and have fun. Make what you want to make, and make it your own. Just keep building, keep coding, and you'll keep learning and getting better. And if you've made it this far and still haven't started writing code yet, now's the time! We only get better through practice, and there's no way around it: you've got to put the hours in yourself if you want to be a programmer. A project like this is the perfect way to build valuable experience and problem solving skills that you'll carry with you for the rest of your life. Good luck!


Ben Johnson My name is Ben and I help people learn how to code by gaming. I believe in the power of project-based learning to foster a deep understanding and joy in the craft of software development. On this site I share programming tutorials, coding-game reviews, and project ideas for you to explore.