KissKissBan: A Competitive Human Computation Game for Image Annotation

Chien-Ju Ho, Tao-Hsuan Chang, Jong-Chuan Lee, Jane Yung-jen Hsu, and Kuan-Ta Chen

PDF Version | Contact Us

Abstract

In this paper, we propose a competitive human computation game, KissKissBan (KKB), for image annotation. KKB is different from other human computation games since it integrates both collaborative and competitive elements in the game design. In a KKB game, one player, the blocker, competes with the other two collaborative players, the couples; while the couples try to find consensual descriptions about an image, the blocker's mission is to prevent the couples from reaching consensus. Because of its design, KKB possesses two nice properties over the traditional human computation game. First, since the blocker is encouraged to stop the couples from reaching consensual descriptions, he will try to detect and prevent coalition between the couples; therefore, these efforts naturally form a player-level cheating-proof mechanism. Second, to evade the restrictions set by the blocker, the couples would endeavor to bring up a more diverse set of image annotations. Experiments hosted on Amazon Mechanical Turk and a gameplay survey involving 17 participants have shown that KKB is a fun and efficient game for collecting diverse image annotations.

1  Introduction

Since the introduction of the ESP Game, Games With A Purpose (GWAP) [1] successfully demonstrates the potential of using human efforts to accomplish tasks that are intractable for computers. Useful information is generated as a side effect of game play. Many applications have been developed to solve different computationally hard problems, such as commonsense collection [2] and music annotation [3].
Current GWAP implementations rely on player collaborations to gather desired information. For example, players in the ESP Game are given the same image, and the descriptions which they both agree become the image labels. Players will be rewarded, e.g., gaining game points, for achieving consensus opinions. The matching mechanism encourages players to contribute relevant information and provides a validation for the information contributed by players. By changing the objects presented to the players (e.g., images or music) and the descriptions players are required to provide (e.g., labels or preferences), this mechanism can be applied to solve various AI-hard problems.
However, there are some known issues in this kind of collaborative mechanism. First, players can benefit from cheating by forming coalitions. If players agree on typing "a" for every image, no useful information will be collected. This issue is usually dealt by system-level approaches, e.g. pairing players from different places or using pre-recorded bots to break the coalition. Second, players tend to give easier and more generic descriptions [4], and therefore the diversity of the output becomes limited. While taboo words [5] have been proposed to alleviate this problem, they can only be constructed after an image has been played for a couple of times. Moreover, taboo words present additional information besides the image and may cause a potential problem in biasing player behavior. For instance, players may tend to guess "guy" when seeing an image with a taboo word "man".
This research proposes a novel human computation game, KissKissBan (KKB), which addresses these issues by introducing competitive element into the game. Besides the typical two-player matching mechanism, KKB introduces an additional player, the blocker, whose objective is to stop the matching from happening. By entering labels prior to the matching process, the blocker provides a blocked word list. Other players, the couples, would get penalties for guessing the blocked words. In contrast to the taboo words, the blocked words are not visible to the couples. While the blocker is in competition with the other two players, he/she is motivated to break the coalitions between the couples. Therefore, KKB naturally provides a player-level cheating-proof mechanism. Besides, the invisible blocked words set the restrictions of the available words, thus the couples are encouraged to provide more diverse labels to evade the restrictions.
This paper starts by briefly reviewing human computation and the image naming process in a psychological view. We then introduce the design and implementation of KKB. Finally, we explain the evaluation results and outline the contribution and future work of this research.

2  Related Work

In this section, we introduce the recent developments in human computation games and give a brief overview on picture naming process in psychology.

2.1  Human Computation Games

Human computation aims to solve problems that are hard for computers by utilizing human brain powers. For example, Amazon Mechanical Turk1 provides a marketplace for the developer to outsource human intelligence tasks. Human computation games, also called Games With A Purpose (GWAP) [1], propose that using computer games can gather human players and solve open problems as a side effect of playing. The GWAP approach has been shown to be useful and is widely used in various domains, such as image tagging [5,6], commonsense collection [2], and music annotation [3].
To ensure the quality of the collected labels, most GWAP implementations adopt consensus opinions as the correctness measure. Taking the ESP Game as an example, image labels are generated by collecting descriptions which both players agree in the game. In [7], Luis von Ahn proposed three game-templates, which summaries their successful experiences in deploying GWAPs. The three templates, namely input-agreement games, output-agreement games, and inversion-problem games, rely on player's collaborations to collect consensus opinions. In contrast to previous work, we demonstrate the game design integrating competitive elements.

2.2  Picture Naming

In the ESP Game [5], players perform the process of giving descriptions to the images. This process usually involves naming objects in the image and is well studied as picture-naming process. In psychology, an important measure in picture-naming process is how fast a person can name a given picture correctly. The naming latency is determined by two main factors, namely the word frequency and age of acquisition (AoA) [8]. Word frequency is the times of occurrence in large word corpus, and age of acquisition is the age at which player learned the word. Typically, players name faster when the words are learned earlier or have higher frequency. While there are discussions about which factor is more important [9,10], we know that the word properties, i.e. the word frequency and AoA, affect the player efforts and the responsive time in naming the pictures.
In our game design, we adopt two time intervals, 7 seconds and 30 seconds, in different game stages. Investigating the differences between the image labels collected in two stages may provide additional information besides the matching events.

3  Game Mechanisms

KissKissBan (KKB) is designed to be played by three online players. One of the players is the "blocker" and the other two players are the "couples". With the same image presented, the couples try to match (Kiss) with each other by typing the same word and the blocker tries to stop couples from matching (Ban). Actually, KissKissBan is named by combining the objectives of game players.
The game rules are described as follows. In the begining of each round, the blocker has 7 seconds to provide blocked word list, which is the list of words he/she thinks couples might match on. These blocked words are not visible to the couples. After the 7 seconds of entering blocked words, the couples have 30 seconds to match with each other. The game time will decrease by 5 seconds if any couple types the blocked word, i.e., being blocked. Also, agreeing on the blocked word does not count as matching. The couples win the round if they successfully match with each other within the time limit, otherwise the blocker wins. Players switch roles every 5 rounds in 15 rounds of the game.

3.1  Different Roles

There are two different roles in KKB:
figures/blocker.png (a) The user interface for the blocker after the blocker finish typing the blocked words. The blocker can monitor all the words the couples are typing.
figures/couple.png (b) The user interface for the couple when the couple is blocked. When the couple guesses the blocked word, he/she will be punished by decreasing the available game time.
Figure 1: Screenshots of the game.

3.2  Incentive Structure

KKB is a three-player zero sum game. In the current implementation, the blocker loses 200 points and each couple gains 100 points when the couples win; the blocker gains 200 points and each couple loses 100 points when the couples lose. While the blocker and the couples are strictly competitive, the blocker is motivated to prevent the couples from cheating. Therefore, KKB provides a player-level cheating-proof mechanism.
Another difference between the blocker and the couples is their available time period for entering words. In order to increase the possibility of blocking, the blocker will have to type as many words as possible in the short period of time, i.e., 7 seconds. Studies [9,10] indicate that the time latency in picture naming is related to the properties of words, e.g., the word frequency or age of acquisition. Therefore, the blocked words given by the blocker may be more common and easier. On the other hand, the couples have sufficient time to consider their strategies for matching. Under the belief that the blocker would give more common and easier words as blocked words, the couples are motivated to give harder words. Therefore, the mechanism of KKB promotes a diverse set of output.

3.3  Collected Labels

In the ESP Game, image labels are generated when two players match with each other. In KKB, there are two kinds of labels, namely the matching label and the blocking label. The matching label is defined as the label agreed by both couples, and the blocking label is the label that is generated when any couple guesses the blocked words. Actually, the blocking label is the matching between the couple and the blocker.
According to this definition, it is possible to generate more than one label in each round since the couples might be blocked for several times in the same round. This mechanism increases the efficiency of labeling.

3.4  Why Not Use Taboo Words

The blocked words are similar to the taboo words, but there are two main differences between the mechanisms. First, the taboo words are generated from the statistics of past play history, but the blocked words are entered by the blocker in each play. Second, the taboo words are visible by players but the blocked words are not.
Generating the blocked words which are invisible to the couples provides more uncertainties, and the couples would be motivated to provide more diverse labeling in order not to match the blocked words. Besides, the presence of the taboo words may have a potential problem of biasing player behavior. When seeing an image with a taboo word "sky", players may tend to guess some related words, such as "blue" or "cloud", instead of analyzing the image content. This phenomena can be illustrated by the auto-playing bot written by Weber et al [4]. They designed a bot which plays the ESP Game according to their trained language model. Even without analyzing the image, they can match with normal player at the rate of 81% for images with taboo words presented.

3.5  Implementation

KissKissBan follows the client-server architecture. To facilitate the deployment of the game, the client is implemented by Adobe Flash, and the server is written in pure Python. During the game, the server logs all the player activities into the database for future analysis. We also implement bots, which replay the actions performed by previous players, for games with less than three players.

4  Evaluation

To evaluate the efficiency and quality of the collected data, we published KissKissBan on Amazon Mechanical Turk to collect game data. In addition, we conducted a user survey involving 17 players to evaluate the gameplay of KKB.
The evaluation is base on the 125 images randomly chosen from the dataset released by the ESP Game. The labels in the dataset are served as partial ground truth in the evaluation.

4.1  Efficiency

In our experiments, KissKissBan has been played for 537 times. 5521 labels, including 3296 blocking labels and 2225 matching labels, have been generated in total 4955 rounds of play. On average, there are 1.11 labels generated in each round, and 3.20 labels are collected in each minute for one game. Though we introduce a new player to decrease the chance of matching, the game is still efficient in generating image labels.
The quality of the labels is evaluated using the descriptions provided by the ESP Game Dataset. In average, there are 13.89 descriptions per image for the images we used. Taking these descriptions as "partial" ground truth for image labeling, the precision of our collected labels is 78.84% and the recall is 70.02%. The precision is the ratio of our labels to be correct. This result demonstrates the data quality of our collected labels. Actually, the correct ratio should be much higher since we only use "partial" ground truth for evaluation. The recall of 70.02% shows the diversity of KKB labels.

4.2  Property of Collected Labels

To evaluate the data properties of collected labels, we re-implemented the ESP Game for comparison. We will call this re-implemented version as K-ESP. To simplify the comparison, there is no taboo word in K-ESP. After publishing on Amazon Mechanical Turk, K-ESP has been played for 5977 rounds, and 4994 labels are collected.
In 4994 labels generated by K-ESP, 6.56 distinct labels are collected for each image, whereas KKB has collected 11.54 distinct labels per image out of 5521 labels in total. The data entropy calculated from K-ESP is 4.79, while the data entropy in KKB is 7.18. These statistics suggest that KKB motivates players to give more diverse labels. Table 1 gives an example of different labels generated by KKB and K-ESP.
figures/sample1.png
K-ESP ML-KKB BL-KKB
man * 21 beach * 3 sea * 9
beach * 10 water * 3 man * 8
karate * 5 sand * 3 ocean * 3
water * 1 sea * 2 black * 1
ninja * 1 china * 1
kungfu * 1 sand * 1
ocean * 1

4.3  Gameplay Survey

To evaluate the gameplay, we conduct a questionnaire survey involving 17 participants, consisting of 11 males and 6 females. The participants are all college students, whose ages are ranging from 21 to 25. Before answering the questions, they are required to play the game at least 2 times, i.e. 30 rounds. Some of them have played the game over 10 times.
In the anonymous survey, the average enjoyability rating is 3.76 out of 5, and 88 percent of the subjects claim that they will play the game again. In addition, over 60% of the players think it's more fun and challenging to play as the blocker.

5  Conclusion

We have presented KissKissBan, a competitive human computation game for image annotation. While the ESP Game and other GWAPs present using player collaborations for collecting useful information, the main contribution of this paper is to demonstrate the integration of competitive elements into human computation games. In the design of KKB, we have two advantages over traditional human computation games: 1) KKB provides a player-level cheating-proof mechanism which can alleviate coalition between players; 2) KKB motivates players to contribute more diverse labeling and therefore collects a broader set of data. Evaluations on Amazon Mechanical Turk and a gameplay survey have shown KKB to be an efficient and fun game for collecting diverse image annotations.
In our future work, we will explore how players behave when they play as different game roles. The differences between KKB labels, namely the matching labels and the blocking labels, is also an important subject to investigate.

Acknowledgement

This work was supported in part by Taiwan E-learning and Digital Archives Programs (TELDAP) sponsored by the National Science Council of Taiwan under the grants NSC98-2631-001-011 and NSC98-2631-001-013. It was also supported in part by the National Science Council of Taiwan under the grants NSC97-2221-E-001-009.

References

[1] Luis von Ahn. Games with a purpose. IEEE Computer Magazine, 39(6):92-94, 2006.
[2] Luis von Ahn, Mihir Kedia, and Manuel Blum. Verbosity: a game for collecting common-sense facts. In CHI '06: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 75-78. ACM Press, 2006.
[3] Edith L. M. Law, Luis von Ahn, Roger B. Dannenberg, and Mike Crawford. Tagatune: A game for music and sound annotation. In International Conference on Music Information Retrieval (ISMIR' 07), pages 361-364, 2003.
[4] Ingmar Weber, Stephen Robertson, and Milan Vojnovic. Rethinking the ESP game. Technical report, Microsoft Research, September 2008.
[5] Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In CHI '04: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 319-326. ACM Press, 2004.
[6] Luis von Ahn, Ruoran Liu, and Manuel Blum. Peekaboom: a game for locating objects in images. In CHI '06: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 55-64. ACM Press, 2006.
[7] Luis von Ahn and Laura Dabbish. Designing games with a purpose. Commun. ACM, 51(8):58-67, 2008.
[8] J.B. Carroll and M.N. White. Word frequency and age of acquisition as determiners of picture-naming latency. The Quarterly Journal of Experimental Psychology, 25(1):85-95, 1973.
[9] C.M. Morrison, A.W. Ellis, and P.T. Quinlan. Age of acquisition, not word frequency, affects object naming, not object recognition. Memory and Cognition, 20:705-705, 1992.
[10] C. Barry, K.W. Hirsh, R.A. Johnston, and C.L. Williams. Age of acquisition, word frequency, and the locus of repetition priming of picture naming. Journal of Memory and Language, 44(3):350-375, 2001.

Footnotes:

1. https://www.mturk.com/mturk/


Sheng-Wei Chen (also known as Kuan-Ta Chen)
http://www.iis.sinica.edu.tw/~swc 
Last Update September 19, 2017