本文发自 http://www./blog/how-i-realize-quick-macro-by-python/ 玩某手游已经一年了,作为咸鱼,对于游戏资源的看法是够用就好,因此资源始终维持在低保线上。这样的悠闲生活一直维持到最新出的纲领为止。该纲领丧心病狂地需要50发大建,逼得我这条咸鱼开始手动挂短时间后勤,每隔一两个小时点几下。 某次被小伙伴看到,他惊呼:“现在还手动肝的人已经不多啦!”。请教之,答曰按键精灵大法。 这让我回想起初中年代用按键精灵挂回合制游戏的时光,那么多年过去了,没想到依然健在。但今时已非彼日,作为一名Programmer,能否自己写一个按键脚本呢? 说干就干,语言选择老朋友Python。 分析模拟点击首要解决的问题是模拟点击,经过一番搜索后发现 autopy 这个库不错,能够模拟鼠标移动和点击。 项目主页为: https://github.com/msanders/autopy pip装的就是这个版本。但由于项目已经年久失修,目前版本的Mac OS会安装失败,建议使用其他人维护的版本:
| pip3 install git+https://github.com/potpath/autopy.git |
安装完毕后,使用很简单:
| import autopy |
|
|
| autopy.mouse.smooth_move(x, y) |
| autopy.mouse.click() |
图像匹配在实现了模拟鼠标进行点击后,我们还需要对要点击目标进行定位,这需要用到图像匹配。 autopy 集成了一套图像匹配组件,但由于是按像素进行匹配的,速率巨慢且准确率低,考虑用更专业的 opencv 配合PIL来做。 安装
| pip3 install Pillow |
| pip3 install imutils |
| pip3 install opencv-python |
从大图里面匹配小图的思路如下: 转成灰度图 提取边缘 模版匹配 选取相似度最高的结果 得到小图在大图中的起始点和匹配区域的大小
代码如下:
| def match(small_path, large_path): |
| small = cv2.imread(small_path) |
| small = cv2.cvtColor(small, cv2.COLOR_BGR2GRAY) |
| small = cv2.Canny(small, 50, 200) |
|
|
| large = cv2.imread(large_path) |
| large = cv2.cvtColor(large, cv2.COLOR_BGR2GRAY) |
| large = cv2.Canny(large, 50, 200) |
| result = cv2.matchTemplate(large, small, cv2.TM_CCOEFF) |
| _, max_value, _, max_loc = cv2.minMaxLoc(result) |
| return (max_value, max_loc, 1, result) |
其中 max_loc 为起始点坐标,匹配的大小和 small 大小一样,即 height, width = small.shape[:2] 缩放图像匹配然而以上方法只能用于匹配从大图里面扣出来的子图,一旦大图大小发生变化,比如游戏窗口缩小了一点,就会匹配失败。我们需要一套能在图片被缩放后依然能够匹配的方法。 经过一番搜索后,找到了这篇文章: http://www./2015/01/26/multi-scale-template-matching-using-python-opencv/ 其基本思路就是对大图按比例进行缩小,每次缩小后执行一次匹配,直到大图足够小为止,返回历次匹配中相似度最大的那个匹配。 代码如下:
| def scale_match(small_path, large_path): |
| small = cv2.imread(small_path) |
| small = cv2.cvtColor(small, cv2.COLOR_BGR2GRAY) |
| small = cv2.Canny(small, 50, 200) |
| height, width = small.shape[:2] |
|
|
| large = cv2.imread(large_path) |
| large = cv2.cvtColor(large, cv2.COLOR_BGR2GRAY) |
| current_max = None |
|
|
| for scale in numpy.linspace(0.2, 1.0, 20)[::-1]: |
| resized = imutils.resize(large, width=int(large.shape[1] * scale)) |
| r = large.shape[1] / float(resized.shape[1]) |
| # if the resized image is smaller than the small, then break |
| if resized.shape[0] < height or resized.shape[1] < width: |
| break |
|
|
| resized = cv2.Canny(resized, 50, 200) |
| result = cv2.matchTemplate(resized, small, cv2.TM_CCOEFF_NORMED) |
| _, max_value, _, max_loc = cv2.minMaxLoc(result) |
| if current_max is None or max_value > current_max[0]: |
| current_max = (max_value, max_loc, r, result) |
| return current_max |
多次匹配考虑如果在大图中,小图出现了多次,而我们需要对所有出现进行匹配,怎么办呢? opencv官方教程中给出的方案是设置一个threshold,和前面的图像匹配 一节取近似度最大的作为匹配不同,所有大于threshold的都会作为匹配。 代码如下:
| points = [] |
| loc = numpy.where(result >= threshold) |
| for point in zip(*loc[::-1]): |
| points.append((numpy.float32(point[0]), numpy.float32(point[1]))) |
然而这个方法有个缺点,就是几乎不可能找到这样的一个threshold,能够保证近似度大于该值的就是匹配。并且会存在对同一个区域进行重复多次匹配的问题,这样的直观结果就是某些区域被匹配了多次,而某些区域却没有被匹配到。 在我的应用场景中,要匹配的区域数是确定的(4),回想本科的数据挖掘课,k-means不就是干这个的嘛~于是通过设置一个较小的threshold以获得大量的匹配,然后对匹配做k-means求出4个中心点。 幸运的是,opencv实现了k-means,不用另外去找一个库了:
| points = numpy.array(points) |
| term_crit = (cv2.TERM_CRITERIA_EPS, 30, 0.1) |
| ret, labels, centers = cv2.kmeans(points, 4, None, term_crit, 10, 0) |
得到的 centers 即为所需的4个中心点,即我们要匹配4个区域。 OCR在匹配了图片后,有时我们还希望对其进行OCR,以获得当前画面上的文字信息,这里采用Google的开源OCR组件 tesseract 安装:
| brew install tesseract |
| pip3 install pytesseract |
到 https://github.com/tesseract-ocr/tesseract/wiki/Data-Files下载字库,如简体中文字库字库文件为 https://github.com/tesseract-ocr/tessdata/raw/4.00/chi_sim.traineddata 找到 tesseract 的安装目录:
| brew info tesseract |
| tesseract: stable 3.05.01 (bottled), HEAD |
| OCR (Optical Character Recognition) engine |
| https://github.com/tesseract-ocr/ |
| /usr/local/Cellar/tesseract/3.05.01 (80 files, 98.6MB) * |
| ... |
放到相对目录 ./share/tessdata/ 下 匹配流程: 根据匹配结果,计算截取区域 对大图进行截取 送入tesseract
代码如下:
| def ocr(self, matchings): |
| import pytesseract |
| pytesseract.pytesseract.tesseract_cmd = "/usr/local/bin/tesseract" |
| texts = [] |
| if type(matchings) is not list: |
| matchings = [matchings] |
| for m in matchings: |
| start_x, start_y = (int(m["loc"][0] * m["ratio"]), int(m["loc"][1] * m["ratio"])) |
| end_x, end_y = (int((m["loc"][0] + m["size"][1]) * m["ratio"]), int((m["loc"][1] + m["size"][0]) * m["ratio"])) |
| clip = self.large_gray[start_y:end_y, start_x:end_x] |
| image = Image.fromarray(clip) |
| texts.append(pytesseract.image_to_string(image, lang='chi_sim')) |
| return texts |
由于OCR是可选组件,这里把 import 放到函数里面。 实现将以上函数进行封装,得到类 Recognizer :
| import cv2 |
| import imutils |
| import numpy |
|
|
| from PIL import ImageGrab, Image |
| from time import sleep |
|
|
|
|
| class Recognizer(): |
| def __init__(self, large): |
| if isinstance(large, str): |
| large = cv2.imread(large) |
| self.large_origin = large |
| self.large_gray = cv2.cvtColor(large, cv2.COLOR_BGR2GRAY) |
| self.large_size = large.shape[:2] |
|
|
| def match(self, small, scale=False): |
| if isinstance(small, str): |
| small = cv2.imread(small) |
|
|
| small = cv2.cvtColor(small, cv2.COLOR_BGR2GRAY) |
| small = cv2.Canny(small, 50, 200) |
| size = small.shape[:2] |
| print("match: [{}x{}] in [{}x{}]".format(size[0], size[1], self.large_size[0], self.large_size[1])) |
|
|
| if scale: |
| current_max = None |
|
|
| for ratio in numpy.linspace(0.2, 1.0, 20)[::-1]: |
| resized = imutils.resize(self.large_gray, width=int(self.large_size[1] * ratio)) |
| r = self.large_size[1] / float(resized.shape[1]) |
| # if the resized image is smaller than the small, then break |
| if resized.shape[0] < size[0] or resized.shape[1] < size[1]: |
| break |
|
|
| resized = cv2.Canny(resized, 50, 200) |
| result = cv2.matchTemplate(resized, small, cv2.TM_CCOEFF_NORMED) |
| _, max_value, _, max_loc = cv2.minMaxLoc(result) |
| if current_max is None or max_value > current_max['value']: |
| current_max = {"value": max_value, "loc": max_loc, "size": size, "ratio": r, "result": result} |
|
|
| return current_max |
| else: |
| large = cv2.Canny(self.large_gray, 50, 200) |
| result = cv2.matchTemplate(large, small, cv2.TM_CCOEFF) |
| _, max_value, _, max_loc = cv2.minMaxLoc(result) |
| return {"value": max_value, "loc": max_loc, "size": size, "ratio": 1, "result": result} |
|
|
|
|
|
|
| def multi_match(self, small, scale=False, cluster_num=1, threshold=0.8): |
| m = self.match(small, scale) |
| matchings = [] |
| points = [] |
|
|
| loc = numpy.where(m["result"] >= threshold) |
|
|
| for point in zip(*loc[::-1]): |
| points.append((numpy.float32(point[0]), numpy.float32(point[1]))) |
|
|
| points = numpy.array(points) |
| term_crit = (cv2.TERM_CRITERIA_EPS, 30, 0.1) |
| ret, labels, centers = cv2.kmeans(points, cluster_num, None, term_crit, 10, 0) |
| for point in centers: |
| matchings.append({"value": m["value"], "loc": point, "size": m["size"], "ratio": m["ratio"], "result": m["result"]}) |
| print('K-Means: {} -> {}'.format(len(loc[0]), len(matchings))) |
| return matchings |
|
|
|
|
|
|
| def draw_rect(self, matchings, output_path): |
| large_origin = self.large_origin.copy() |
| if not isinstance(matchings, list): |
| matchings = [matchings] |
| for m in matchings: |
| start_x, start_y = (int(m["loc"][0] * m["ratio"]), int(m["loc"][1] * m["ratio"])) |
| end_x, end_y = (int((m["loc"][0] + m["size"][1]) * m["ratio"]), int((m["loc"][1] + m["size"][0]) * m["ratio"])) |
| cv2.rectangle(large_origin, (start_x, start_y), (end_x, end_y), (0, 0, 255), 2) |
| cv2.imwrite(output_path, large_origin) |
|
|
| def draw_clip(self, clips, output_path): |
| if type(clips) is not list: |
| cv2.imwrite(output_path, clips) |
| else: |
| for index, clip in enumerate(clips): |
| path = output_path.format(index) |
| cv2.imwrite(path, clip) |
|
|
|
|
|
|
| def clip(self, matchings): |
| clips = [] |
|
|
| if not isinstance(matchings, list): |
| matchings = [matchings] |
| for m in matchings: |
| start_x, start_y = (int(m["loc"][0] * m["ratio"]), int(m["loc"][1] * m["ratio"])) |
| end_x, end_y = (int((m["loc"][0] + m["size"][1]) * m["ratio"]), int((m["loc"][1] + m["size"][0]) * m["ratio"])) |
| clip = self.large_origin[start_y:end_y, start_x:end_x] |
| clips.append(clip) |
| return clips |
|
|
| def ocr(self, matchings): |
| import pytesseract |
| pytesseract.pytesseract.tesseract_cmd = "/usr/local/bin/tesseract" |
| texts = [] |
| if not isinstance(matchings, list): |
| matchings = [matchings] |
| for m in matchings: |
| start_x, start_y = (int(m["loc"][0] * m["ratio"]), int(m["loc"][1] * m["ratio"])) |
| end_x, end_y = (int((m["loc"][0] + m["size"][1]) * m["ratio"]), int((m["loc"][1] + m["size"][0]) * m["ratio"])) |
| clip = self.large_gray[start_y:end_y, start_x:end_x] |
| image = Image.fromarray(clip) |
| texts.append(pytesseract.image_to_string(image, lang='chi_sim')) |
| return texts |
|
|
|
|
| def center(self, matching): |
| x = int((matching["loc"][0] + matching["size"][1] / 2) * matching["ratio"] / 2) |
| y = int((matching["loc"][1] + matching["size"][0] / 2) * matching["ratio"] / 2) |
| return x, y |
提供截图函数 capture_screen:
| def capture_screen(): |
| screenshot = ImageGrab.grab().convert('RGB') |
| screenshot = numpy.array(screenshot) |
| return cv2.cvtColor(screenshot, cv2.COLOR_RGB2BGR) |
注意PIL只能转换到RGB,而opencv用的是BGR,因此需要再进行一次转换。 🌰从大图中识别 fight 、 clock 、 frame 三个区域:fight 、 clock 、 frame 定义如下: 


在大图中把它们圈出来:
| screenshot = capture_screen() |
| main_rgz = Recognizer(screenshot) |
|
|
| fight_path = '/Users/binss/Desktop/opencv/templates/fight.png' |
| clock_path = '/Users/binss/Desktop/opencv/templates/clock.png' |
| frame_path = '/Users/binss/Desktop/opencv/templates/frame.png' |
| fight = main_rgz.match(fight_path, True) |
| clock = main_rgz.match(clock_path, True) |
| frame = main_rgz.match(frame_path, True) |
|
|
| matchings = [fight, clock, frame] |
| output_path = '/Users/binss/Desktop/debug.png' |
| main_rgz.draw_rect(matchings, output_path) |
得到下图: 
单独提取frame区域
| clips = main_rgz.clip(frame) |
| main_rgz.draw_clip(clips[0], '/Users/binss/Desktop/frame_clip.png') |
得到下图: 
从 frame 区域中匹配多个line,并进行OCRline 定义如下: 
即将其切成四行后,分别进行OCR:
| line_path = '/Users/binss/Desktop/opencv/templates/line.png' |
| time_rgz = Recognizer(clips[0]) |
| matchings = time_rgz.multi_match(line_path, True, 4, 0.2) |
| texts = time_rgz.ocr(matchings) |
| print(texts) |
结果如下:
| K-Means: 8 -> 4 |
| ['后勤支援中 8 一 1 OO:19:44', '后勤支援中 8 一 1 OO:19:44', '后勤支援中 1 一 4 01:17:26', '后勤支援中 0 一 1 00:48:57'] |
总结由于我完全没修读过计算机视觉等相关领域的课程,折腾opencv、写下本文只是一时兴起,因此这里仅作抛砖引玉,欢迎指点更优的解决方案。 当然这套东西搞下来发现一点也不实用,体现在: 匹配速度太慢,虽然没用多进程,但耗费资源已经很可观,一开始跑CPU占用率咻一声就上去了 尝试移植到windows 10,结果发现无论是opencv还是autopy都各种报错,无法成功跑起来,最终放弃 无法后台执行
最后老老实实滚回去用按键精灵了。以本文纪念我逝去的周末。
|