Placing Virtual Machines to Optimize Cloud Gaming Experience

Hua-Jun Hong, De-Yu Chen, Chun-Ying Huang, Kuan-Ta Chen, and Cheng-Hsin Hsu

PDF Version | Contact Us

1  Introduction

To offer on-demand gaming services to many gamers using heterogeneous client computers, including game consoles, desktops, laptops, smartphones, and set-top boxes, increasingly more service providers push computer games to powerful cloud servers and stream the game scenes to a simple application running on client computers [2]. Such on-demand game services are referred to as cloud gaming by various companies, such as Gaikai, Ubitus, and OnLive. Market research predicts that the cloud gaming market is going to grow to 8 billion USD by 2017 [3], and some leading game development companies [4] have seriously considered this new opportunity. Therefore, we expect to see many more cloud gaming services soon.
Offering cloud gaming services in a commercially-viable way is, however, very challenging as demonstrated by OnLive's financial difficulty [5]. The main challenge for cloud gaming providers is to find the best tradeoff between two contradicting objectives: reducing the hardware investment and increasing the gaming Quality-of-Experience (QoE). Satisfactory gaming QoE demands for high-end hardware, which may incur huge financial burden; meanwhile, using low-end hardware leads to less pleasing gaming QoE, which may drive gamers away from the cloud gaming services. Moreover, different game genres impose diverse hardware requirements, which may result in insufficient or wasted hardware resources if server resources are not well planned. For example, the servers configured for cutting-edge 3D first person shooter games may be an overkill for 2D casual games. The server diversity renders the dilemma of finding the best tradeoff between profit and QoE even harder.
arch.png
Figure 1: The architecture of cloud gaming services, where GS denotes cloud gaming server.
Since cloud gaming services push games to cloud servers, server consolidation enables dynamic resource allocation among game servers serving multiple gamers for better overall performance and lower operational cost. In this paper, we study the problem of efficiently consolidating multiple cloud gaming servers on a physical machine using modern virtual machines (VMs), such as VMware and VirtualBox, in order to provide high gaming QoE in a cost-effective way, as illustrated in Fig. 1. We consider the VM placement problem to maximize the total profit while providing the just-good-enough QoE to gamers. This problem is referred to as provider-centric problem throughout this paper.
The considered problem is a variation of the virtual network embedding problem [6], and thus is NP-Complete. The existing solutions for network embedded problems [7,8,9,10,6], however, are designed for computational/storage intensive applications, without taking the real-time requirements of cloud gaming (and other highly interactive applications) into consideration. In particular, unlike computational/storage intensive applications that demand for high CPU/disk throughput, cloud games demand for high QoE, in terms of, e.g., responsiveness, precision, and fairness [11,12,13]. Hence, the existing virtual network embedding algorithms do not work for cloud gaming providers. To the best of our knowledge, this paper is the first attempt to tackle the VM placement problem to maximize the cloud gaming QoE.
In particular, this paper makes the following contributions:

2  Related Work

2.1  General Cloud Applications

Optimizing general cloud applications has been studied in cloud environments. For example, Zaman et al. [15] propose an auction-based mechanism for dynamic provision and allocation of VMs to maximize the provider's profit and improves the total utilization of cloud resources. Lin et al. [16] formulate the data replication problem in the clouds as a mathematical optimization problem and propose several algorithms for the I/O intensive applications. In our work, we formulate the VM placement problem of cloud gaming systems and propose optimization algorithms to solve the problem. Different from these two studies [15,16], we optimize the real-time cloud games with an objective of maximizing the provider's profit by QoE-aware algorithms while optimizing the gaming quality at the same time.
VM migration techniques have been investigated for non real-time applications. Marzolla et al. [17] utilize the live migration technology to move the VMs away from the the lightly loaded physical servers and thus the empty servers can be switched to low-power mode. Ferreto et al. [18] create a dynamic server consolidation algorithm with migration control and avoid unnecessary migrations to reduce the number of powerd on servers and migration cost. Chen et al. [19] find that virtual machines do not usually use all their resources, and they create an algorithm which also considers the migration cost according to the records of migration history for saving energy. Speitkamp and Bichler [20] present a heuristic solution which approximates the optimal solution by not only considering the cost but also determining whether the problem size can be optimally solved. Nathuji et al. [21] create a performance interference model and classify the applications into different resource bounds using historical data. The applications are then consolidated on physical servers for better Quality of Service (QoS). Zhu and Tung [22] also consider the interference and implement a system to determine the placement of VMs to avoid the interference and meet the desired QoS values. None of the aforementioned studies take cloud gaming QoE levels into consideration.

2.2  Cloud Games

The benefits of game server consolidation have been studied for certain game genres. For example, Lee and Chen [23] address the server consolidation problem for Massively Multiplayer Online Role-Playing Game (MMORPG). In particular, they propose a zone-based algorithm to leverage spatial locality of gamers in order to reduce the hardware requirements at the servers. Their work is different from ours for two reasons. First, we consider cloud gaming that streams high-quality real-time videos to gamers, while MMORPG servers only send low bitrate status updates. Second, we explicitly optimize gaming QoE in this paper, while they only attempt to save energy at the data centers without taking QoE into consideration.
Duong et al. [24] and Wu et al. [25] are complementary to our work, as they concentrate on minimizing the queueing delay of a cloud gaming system, while we focus on the user experience during the game sessions. For example, Duong et al. [24] develop resource provisioning and waiting queue scheduling algorithms to admit selective incoming gamers for the best profit under user-specified maximal waiting times. Wu et al. [25] also propose an online control algorithm to quickly serve users in the waiting queue. Compared to their work, we optimize the gaming QoE after a user is admitted in the system; such QoE maximization is arguably more important, as gamers typically can only tolerate a few minutes of waiting time, but each game session may last for hours.
Most of the cloud gaming systems, including Gaikai, Ubitus, and OnLive are proprietary and closed, and thus measuring cloud gaming performance and QoS on them is hard, if not impossible. We employ GamingAnywhere (GA) [14] for our experiments, which is an open cloud gaming system. In particular, we use GA to derive the performance and QoS models for different games on different VMs, and to develop VM placement algorithms. Last, our initial investigations on the QoE-aware virtual machine placement problems were reported in Hong et al. [1].
vm_compare.png
Figure 2: Virtualization overhead depends on game and VM implementations.
fitting_limbo130823.png
Figure 3: Measurement results for CPU utilization, GPU utilization, frame rate, and processing delay. Sample results from Limbo.

3  Measurement Studies

We conduct measurement studies to model the implications of consolidating multiple cloud gaming servers on a physical machine. We set up the GA server [14] on VMware workstation 9 and VirtualBox 4.2.6. The GA client runs on another machine without VMs. The two Windows 7 machines running GA server and client are connected via a wired network, and they are equipped with Intel i7 3.4 GHz CPU and 24 GB memory, and Intel i5 2.8 GHz CPU and 4 GB memory, respectively. We install a NVidia Quadro 6000 GPU on the GA server. We choose three games in different genres: Limbo, Sudden Strike: Normandy (Normandy), and Police Supercars Racing (PSR), and measure various performance metrics over 5-min game sessions with different configurations. We consider four metrics relevant to the VM placement problem: (i) CPU utilization: the average CPU load measured on the physical server, (ii) GPU utilization: the average GPU load measured on the physical server, (iii) frame rate: the average number of frames streamed per second, and (iv) processing delay: the average time for the GA server to receive, render, capture, encode, and transmit a frame, which is measured by the techniques proposed in Chen et al. [26].
We first compare the performance of GA running on the host OS and that running on a single VM with all available resources allocated to it. Fig. 2 gives some sample results, which reveals that: (i) VMs lead to nontrivial overhead, (ii) different VMs result in different amount of overhead, and (iii) different games incur different workloads that may have distinct performance implications on different VMs. Hence, more extensive measurements are required to derive the prediction model of GA performance in each game/VM pair.
Table 1: R-square Values of Different Games/VM
Game VM CPU GPU FPS DELAY
VMware 0.9910 0.9837 0.9767 0.9955
VirtualBox 1.0000 0.9877 0.9933 0.9996
VMware 0.9999 1.0000 0.9865 0.9995
VirtualBox 0.9991 0.9986 0.9764 0.9995
VMware 0.5758 0.9961 0.9917 0.9974
VirtualBox 0.9898 0.9360 0.9969 0.9943
Next, we vary the number of VMs on the server, while equally dividing the 8 CPU cores among all VMs. In particular, we conduct the measurements with 1, 2, 4, and 8 VMs. We plot the sample results from Limbo in Fig. 3. This figure reveals that the CPU utilization, GPU utilization, frame rate, and processing delay can be modeled as sigmoid functions of the number of VMs on a physical server, which are also plotted in Fig. 3 as the curves. We notice that several basic functions, such as ax+b, ax2 +bx +c, a/x, a−a/x, y=log(x), and y=√x may also be used as the regression models [27]. After trying these basic functions, we find that the sigmoid functions fit our measurements much better. Therefore, we employ the sigmoid functions in this paper, and report their R-square values in Table 1. The R-square values indicate how close the sigmoid functions follow the real measurements: the deviation is smaller when the R-square value approaches 1. Hence, Table 1 shows that sigmoid functions model the VM measurement results very well.
The precise fitted sigmoid models are detailed in Sec. 4.2, and the empirically derived parameters are used in Secs. 6 and 7. We acknowledge that the model parameters depend not only on the pairs of game/VM but also on game server specifications and operating systems. This however is not a serious concern, as cloud gaming providers are likely to build data centers with one or very few types of machines, which can be profiled offline beforehand. In extreme cases where the physical servers are more heterogeneous, our measurement approach may adopt online regression for incremental adaptations.

4  VM Placement Problem and Solution

We study the provider-centric problem in this section.

4.1  System Overview

Fig. 1 illustrates the system architecture of the cloud gaming platform, which consists of S physical servers, P gamers, and a broker. Each physical server hosts several VMs, while every VM runs a game and a game server (GS). Several physical servers are mounted on a rack, and multiple racks are connected to an aggregation switch. The aggregation switches are then connected to the Internet via a core switch. Physical servers are distributed in several data centers at diverse locations. The gamers run game clients on desktops, laptops, mobile devices, and set-top boxes to access cloud games via the Internet.
The broker is the core of our proposal. The broker consists of a resource monitor and implements the VM placement algorithm. It is responsible to: (i) monitor the server workload and network conditions, and (ii) place the VMs of individual gamers on physical servers to achieve the tradeoff between QoE and cost that is most suitable to the cloud gaming service. In particular, for public cloud gaming services, the provider's profit is more important, while for closed cloud gaming services, the gaming QoE is more critical. We study the former case in this section, and will consider the later case in Sec. 5. The games may have diverse resource requirements, including CPU, GPU, and memory [28], while the paths between gamers and their associated servers have heterogeneous network resources, such as latency and bandwidth. Moreover, gamers can tolerate different QoE levels for different game genres [29]. Last, we note that the broker can be a virtual service running on a server or a server farm for higher scalability.
Table 2: Symbols Used Throughout This Paper
Sym. Description
S Number of physical servers
P Number of gamers
s Index of a physical server
p Index of a gamer
es,p Round-trip time between physical server
s and gamer p
v Number of VMs running on physical server s
fp(v) Frame rate when serving gamer p with a
server running v VMs
dp(v) Processing delay when serving
gamer p with a server running v VMs
α Frame rate model parameter
β Processing delay model parameter
us(v) CPU utilizations of server s running v VMs
zs(v) GPU utilizations of server s running v VMs
δ CPU utilization model parameter
ζ GPU utilization model parameter
gp Hourly fee paid by gamer p
ws(v) Operational cost of CPU and GPU
cs Cost term consisting of various components
G Memory size of each VM
Gs Memory size of physical server s
B Streaming bit rate of GA server
W Number of data centers
w Index of a data center
Sw Set of servers in data center w
[d\tilde]s,p(v) Sum of processing delay, network delay,
and playout delay
qp Game QoE degradation
γ QoE degradation model parameter
Qp Max tolerable QoE degradation of gamer p
xs,p Decision variable of the problem formulation
t1 Start time of migration
t2 Start time of synchronization before
end of migration
t3 End time of migration
D Probability of each gamer joins (leaves)
a gamer session
ω Normalized migration overhead

4.2  Notations and Models

Table 2 gives the symbols used in this paper. We study the VM placement problem, in which the VM placement decisions affect network delay, processing delay, and operational cost. We write the network delay between server s (1 ≤ s ≤ S) and gamer p (1 ≤ p ≤ P) as es,p which is essentially the round-trip time between them. The es,p values may be measured by various network diagnostic tools, such as Ping and King [30]. We use fp(v) and dp(v) to denote the frame rate and processing delay when serving gamer p with a server running v VMs, which depend on the game played by p. Fig. 3 reveals that sigmoid functions can model fp(v) and dp(v) well, and we write them as fp(v) = [(αp,1)/(1 + e −αp,2v + αp,3)] and dp(v) = [(βp,1)/(1 + e −βp,2v + βp,3)], where αp,1p,3 and βp,1p,3 are model parameters derived from regression. Furthermore, we use us(v) and zs(v) to model the CPU and GPU utilizations of server s running v VMs. Fig. 3 shows that us(v) and zs(v) can also be written as sigmoid functions us(v) = [(δ1)/(1 + e −δ2v +δ3)] and zs(v) = [(ζ1)/(1 + e −ζ2v + ζ3)], where δ13 and ζ13 are the model parameters. We denote gp as the hourly fee paid by gamer p. We let ws(v) = cs(us(v)+zs(v)) be the operational cost of imposing CPU and GPU utilization us(v) and zs(v) on s, where cs is a cost term consisting of various components, such as electricity, maintenance, and depreciation. Moreover, we allocate G GB memory to each VM, whereas physical server s is equipped with Gs GB memory. Last, we consider GA servers to stream at B kbps. We let W be the number of data centers, and use Sw (1 ≤ w ≤ W) to denote the set of servers in data center w. We let Bw be the uplink bandwidth of data center w (1 ≤ w ≤ W). Our bandwidth model is general, as the mapping between servers and data centers is flexible. For example, if the last-mile links are the bottleneck, we may create a virtual data center for each server, such that | Sw | = 1, ∀w.
We next model the QoE of cloud gaming. Recent studies [11,12] suggest that the response time of user inputs directly affects QoE levels. The response time [d\tilde]s,p(v) is the sum of processing delay, network delay, and playout delay. The playout delay is the time duration of receiving, decoding, and displaying a frame at the client. Since playout delay is not affected by VM placements, we do not include it in our model for brevity, and write [d\tilde]s,p(v) = dp(v) +es,p. We generalize the QoE models in Lee et al. [11,12] to be a function of both response time and frame rate. More specifically, we let qp(fp, [d\tilde]s,p) be the gaming QoE degradation observed by gamer p with frame rate fp and response time [d\tilde]s,p. Inspired by the linear QoE model in [11], we write qp(fp,[d\tilde]s,p) = γp, 1 fp + γp,2 [d\tilde]s,p, where γp,1 and γp,2 are model parameters that can be derived by the methodology presented in Lee et al. [11]. Last, we use Qp to denote the maximal tolerable QoE degradation of gamer p.

[1] sort servers on network latency to p in asc. order let xs,p=1 break return x
Figure 4: The pseudocode of the QDH algorithm.

4.3  Problem Formulation

We let xs,p ∈ {0, 1} (1 ≤ p ≤ P, 1 ≤ s ≤ S) be the decision variables, where xs,p=1 if and only if gamer p is served by a VM on server s. With the notations defined above, we formulate the provider-centric problem as:
s.t.  f_p = _p,1 / (1 + e^-_p,2 _s=1^S (x_s,p v_s) + _p,3),   p;
_p = _p,1 + _s=1^S e_s,p x_s,p,   p;
v_s = _p=1^P x_s,p,   s;
1 = _s=1^S x_s,p ,   p;
Q_p _p,1 f_p + _p,2 _p,   p;
B_w B _s _w _p=1^P x_s,p,   w;
G_s G _p=1^P x_s,p,   s;
          x_s,p {0, 1},   1 s S, 1 p P.
The objective function in Eq. (1) maximizes the provider's net profit, i.e., the difference between the collected fee and cost. Eqs. (2) and (3) derive the frame rate and response time as intermediate variables. In Eq. (4), we define another intermediate variable vs to keep track of VMs on each server s, and we evenly allocate the cores among all VMs on a server. Eq. (5) ensures that each gamer is served by a single server. Eq. (6) makes sure that the gaming QoE degradation is lower than the user-specified maximal tolerant level. Eqs. (7) and (8) impose bandwidth and memory constraints on each data center and sever, respectively. In summary, the formulation maximizes the provider's profit while serving each gamer with a (user-specified) just-good-enough QoE level.

4.4  Proposed Algorithm

The provider-centric formulation in Eqs. (1)-(9) can be optimally solved using optimization solvers, such as CPLEX [31]. We refer to the solver-based algorithm as OPT. The OPT algorithm gives optimal solutions at the expense of exponential computation complexity. Therefore, we use OPT for benchmarking and propose an efficient heuristic algorithm, called Quality-Driven Heuristic (QDH), below.
The QDH algorithm is built upon an intuition: it is desirable to consolidate more VMs on a server as long as the user-specified maximal tolerate QoE degradation is not exceeded. Fig. 4 illustrates the pseudocode of the QDH algorithm. For each gamer, the algorithm first sorts all servers on the network latency to that gamer. It then iterates through the servers in the ascending order and creates a VM for the gamer on the first server that can support this gamer without violating constraints in Eqs. (2)-(9). It is clear that the QDH algorithm runs in polynomial time.

[1] sort servers on quality degradation qp(·) in asc. order let xs,p=1 break return x

Figure 5: The pseudocode of the QDH′ algorithm.

5  Alternative Formulation and Algorithms for Closed Systems

The provider-centric problem presented in Sec. 4 is suitable to public cloud gaming services. For closed cloud gaming services, e.g., in hotels, Internet cafes, and amusement parks, maximizing the overall QoE is more important as the network bandwidth is dedicated to cloud gaming. Therefore, we present the gamer-centric formulation and algorithms in this section. We start from the provider-centric formulation in Eqs. (1)-(9), and we first replace the objective function in Eq. (1) with:
min

P

p=1 
γp,1 fp + P

p=1 
γp,2
~
d
 

p 

,
(1)
which minimizes the total QoE degradation. In particular, the QoE degradation is reduced when fp increases or dp decreases as the empirically derived γp,1 is negative and γp,2 is positive. Next, we remove the constraints in Eq. (6) as the new objective function has taken the QoE into consideration. This yields the gamer-centric problem formulation. We develop a solver-based algorithm for the gamer-centric formulation, which is referred to as OPT′.
We also propose an alternative QDH for the gamer-centric problem, which is called QDH′. Fig. 5 illustrates the heuristic algorithm. For each gamer, the algorithm first computes its quality degradation levels on individual servers. It sorts the servers on the quality degradation if serving that gamer using each server. Then, the algorithm iterates through the servers and creates a VM for the gamer on the first server that can support the gamer without violating any constraints in Eqs. (2)-(5), (7)-(8). QDH′ runs in polynomial time.
system_prototype.png
Figure 6: The implemented prototype system.
testbed.png
Figure 7: The cloud gaming testbed in our lab.

6  System Implementation and Testbed

We conduct small-scale evaluations using a real testbed in the section.

6.1  Prototype Implementation

We have implemented a complete cloud gaming system consisting of a broker, physical servers, and GA servers/clients, as illustrated in Fig. 6. We adopt VMWare ESXi 5.1 as the virtualization software on physical servers. ESXi allows us to create VMs on physical servers, and each VM hosts a GA server and a game chosen by the corresponding gamer. We employ VMware vCenter 5.1 as the platform for our broker, which is comprised of Single-Sign-On for user authentication and Inventory Service for managing/monitoring the VMs on ESXi servers. The Inventory Service comes with different APIs, and we use its Java API to interface with the vCenter on the broker so as to control ESXi servers on all physical servers.
Fig. 6 shows the flow of our system. We integrate the GA client and server with VMware ESXi and vCenter. In particular, the GA client provides an interface for gamers to send their accounts and passwords to the broker (). Upon being authenticated (), the GA client sends the user-specified game to the broker, and the broker determines where to create a new VM for that game based on the status of all physical servers and networks (). The broker then instructs the chosen physical server to launch a VM () and sends the VM's IP address to the GA client (, ). Last, the GA client connects to the GA server (), instructs the GA server to run the user-specified game (), and sends the stream of game to GA Client (). This starts a new GA game session.
live_migration.png
Figure 8: Live migration.
adaptive_QDH_profit3.png (a)
        adaptive_QDH_quality3.png (b)
Figure 9: Comparisons between QDHL/QDH′L and QDH/QDH′: (a) net profits and (b) quality.

6.2  Testbed and Practical Concerns

We set up a testbed using the prototype system in our lab, which is shown in Fig. 7. The testbed contains an i7 3.2 GHz broker with the management web page, several i5 3.5 GHz physical servers with NVidia Quadro 6000 cards, and several i5 client computers. The broker, physical servers and client computers are connected via Gigabit Ethernet. We enable the CPU hardware support for virtualization and conduct the following experiments.
We measure the overhead of launching a VM running Windows 7 and compare it against that of natively booting up Windows 7 on the same machine. We found out that both experiments take  ∼ 50 s, showing little additional overhead due to virtualization. We next measure the overhead of live migrations. Fig. 8 illustrates a sample migration of a gamer from the source VM to the destination VM, where the areas with virtual patterns indicate the VM currently used by the gamer. More generally, there are two types of migrations: (i) live migration and (ii) stop-and-copy [32]. With live migration, the memory and disk pages of the source VM are first copied over to the destination VM. Meanwhile, the gamer still connects to the source VM and may produce dirty memory and disk pages. The system iteratively copies those dirty pages to the destination VM until either there is no more dirty pages, or the number of new dirty pages is more than the number of copied dirty pages. Next, the synchronization starts: (i) the system freezes both VMs, (ii) the system copied over the remaining pages, and (iii) the gamer is then served by the destination VM. We let t1 and t3 be the start and end times of the migration procedure, and t2 be the time synchronization starts. Using the notations, live migration copies pages between t1 and t2 without stopping gamers from using the VMs, and freezes VMs between t2 and t3. Our testbed supports live migration and we conduct diverse experiments to quantify the migration overhead.
We discover that the live-migration time t1 to t3 of 20, 30, and 40 GB VM images are about 6, 9, and 11 minutes in our testbed. In addition, the frozen time t2 to t3 are always less than 3 seconds. These three various VM image sizes are roughly mapped to the three considered games: Limbo, PSR, and Normandy. Given that the migration time are non-trivial, recomputing the VM placement problems for all gamers (including those with ongoing sessions) may lead to unacceptable QoE degradation even with live migration. The major cause of the QoE degradation is the duplicated resource reservations: when migrating a gamer, both the source and destination VMs consume resources as shown in Fig. 8. Hence, we propose an migrationless version of the proposed QDH/QDH′ algorithms, which do not migrate the running VMs to avoid the degradation caused by migration time. We denote the new algorithms as QDHL/QDH′L, which only intelligently place the VMs of incoming gamers that have not started the game sessions. That is, by getting rid of the outermost loops in Figs. 4 and 5, we never migrate the ongoing game sessions. Intuitively, QDHL/QDH′L run faster, yet achieve better performance as the high migration time is avoided. Moreover, QDH′L is an optimal migrationless algorithm. We will show this in evaluation sections (Secs. 6.3 and 7).
adaptive_migrationOverhead_profit.png (a)
        adaptive_migrationOverhead_quality.png (b)
Figure 10: Comparisons between QDHL/QDH′L and QDH/QDH′: (a) net profits and (b) quality.
adaptive_opt_profit1.png (a)        
adaptive_opt_quality1.png (b)
Figure 11: Comparisons between QDHL/QDH′L and OPTL/OPT′L: (a) net profits and (b) quality.

6.3  Experiment-Performance Gains of the Migrationless Algorithms

Setup. To quantify the QDHL/QDH′L algorithms, we employ a testbed with 9 physical servers, 15 gamers, and 3 games-Limbo, PSR, and Normandy. In every minute, each gamer joins (leaves) a game session with a probability of D% (1−D%), where D is a system parameter. Each simulation lasts for T minutes. We assume that each physical server can serve up to two VMs and each VM launches a randomly selected game. In each simulation, we measure the fps and processing delay, and use them in the quality model. Also, we measure the CPU and GPU utilizations, and use them in the profit model. We inject realistic network latency (see Sec. 7.1) using dummynet [33]. Last, we set D=90%, T=15 minutes and consider the two performance metrics:
Results. We make three observations on the performance of the QDHL/QDH′L algorithms. First, QDHL/QDH′L outperform QDH/QDH′. In particular, Figs. 9(a) and  9(b) show that the gains between QDHL/QDH′L and QDH/QDH′ are up to 396 dollars and 4% QoE. A closer look reveals that the performance gains are due to high migration overhead.
Due to the increasingly higher computing power, the migration overhead will be gradually reduced and the performance gains of QDHL/QDH′L may be diminishing. To better understand the trend, we let ω be the normalized migration overhead, where 0 ≤ ω ≤ 1. For example, setting ω = 1/3 means the migration overhead becomes 1/3 of the current one. We vary different ω values and plot the average results in Fig. 10. This figure reveal that the migrationless algorithms QDHL/QDH′L still outperform the ordinary algorithms QDH/QDH′ even when ω = 25% and ω = 5%. However, such a steep technology advance is less likely to become a reality in the short term. Hence, we no longer consider the QDH/QDH′ algorithms in the rest of this paper.
Last, we compare the QDHL/QDH′L algorithms against the migrationless optimal solution that exhaustively checks all servers for each new gamer. We refer to the migrationless optimal solutions as OPTL/OPT′L. Fig. 11 reports the average performance over time. Fig. 11(a) shows that QDHL and OPTL result in similar net profit.
More specifically, the OPTL algorithm outperforms the QDHL algorithm in the first half of the experiment, but the QDHL occasionally performs better in the second half. A closer look indicates that since both algorithms are migrationless, once game sessions start, they will be executed until the gamers leave. Therefore, even though OPTL selects the best VM placements for the incoming gamers, it cannot foresee the future (e.g., when will the gamers leave), and thus its profit may be lower than that of the QDHL algorithm. Nonetheless, the overall profit of QDHL is still 10% lower than the optimum.
A closer look depicts that the optimization gain of QDHL is merely 10%. Fig. 11(b) reveals that QDH′L leads to exactly the same (optimal) performance in QoE, compared to OPT′L. Fig. 11 shows the merits of QDHL and QDH′L.
adaptive_QDH_profit1.png (a)
adaptive_QDH_usedserverNum1.png (b)
Figure 12: Provider-centric simulation results with synthetic traces: (a) net profits and (b) used servers.
adaptive_QDH_quality1.png
Figure 13: Gamer-centric simulation results with synthetic traces.

7  Trace-Driven Simulations

In this section, we consider large-scale evaluations using detailed simulations.

7.1  Setup

adaptive_QDH_quality_barchart.png
Figure 14: Fairness in QoE levels on different game genres.
We have built a simulator for the VM placement problem using a mixture of C/C++, Java, and Matlab. We have implemented the QDHL/QDH′L algorithms in our simulator. For comparisons, we have also implemented a VM placement algorithm that places each VM on a random game server that is not fully loaded and in the data center geographically closest to the gamer. This baseline algorithm is referred to as Location Based Placement (LBP) algorithm. We collect gamer and server IP addresses and the latency between each gamer/server IP pair in order to drive our simulator. For servers, we use DigSitesValue [34] to obtain the IP addresses of OnLive data centers in Virginia, California, and Texas. For gamers, we develop a BitTorrent crawler using libtorrent [35] to collect peer IP addresses and then use them as gamer IP addresses. Since OnLive only hosts game servers in the US, we filter out non-US gamer IP addresses using ip2c [36]. We ran our crawler on August 13, 2013 with 4494 torrents downloaded from IsoHunt [37], which gave us 22395 IP addresses and 5875 US IP addresses. Next, we measure the network latencies among gamer/server IP pairs using King [30], since we have no control over neither end systems. We drop the IP addresses without complete latency results to all servers, which leads to 412 gamer IP addresses.
adaptive_QDH_profit2.png (a)
adaptive_QDH_usedserverNum2.png (b)
adaptive_QDH_quality2.png (c)
Figure 15: Simulation results with WoW traces: (a) net profits, (b) used servers, and (c) QoE levels.
adaptive_QDH_runningTime.png
Figure 16: Running time of VM placement algorithms.
adaptive_QDH_profit_scatter.png (a) adaptive_QDH_quality_scatter.png (b)
Figure 17: Impacts of number of gamers on: (a) net profits and (b) QoE levels.
We conduct a three-day simulation for each scenario using different algorithms. The gamers arrive at the broker following a Poisson process with a mean time interval of 4 minutes and each gamer plays for a duration uniformly chosen from {300, 600, 1200, 2400, 4800} minutes. In addition to the synthetic gamer arrival traces, we also employ real World of Warcraft (WoW) traces [38] in our simulations. Each gamer plays a game randomly chosen from Limbo, PSR, and Normandy. We also vary the number of servers S ∈ {192, 384, 768, 1536, 3072} and the migration overheads of 6, 9, and 11 minutes for Limbo, PSR, and Normandy respectively. During each simulation, we run the scheduling algorithm once every minute and we report the mean performance results among all gamers, and 95% confidence intervals whenever applicable. If not otherwise specified, we set S=192, γp,1 = −0.1, γp,2 = 0.1, gp=1, and cs=2. We conduct all the simulations on an Intel i7 3.4 GHz PC. We consider the following performance metrics:

7.2  Results

Performance of QDHL/QDH′L. We plot the provider-centric results in Fig. 12(a), which shows that QDHL significantly outperforms LBP: up to 3.5 times difference. This can be explained by Fig. 12(b), which shows that QDHL turns on fewer servers to achieve higher net profits. We plot the gamer-centric results in Fig. 13, which reveals that QDH′L constantly outperforms LBP: up to 5% QoE gap. Moreover, the confidence intervals show that QDH′L leads to more consistent QoE levels among individual gamers, achieving better fairness. Fig. 14 plots the aggregate QoE of three games, which shows that both QDH′L and LBP are relatively fair to different game genres, while QDHL maximizes the net profits by devoting more resources to less complicated games.
Performance results from WoW traces. In the following, we report results from the WoW traces. We plot the provider-centric results in Fig. 15(a), which shows that QDHL always outperforms LBP with the difference between the two algorithms up to 20+ thousand dollars. And in the first quarter of the simulation, LBP runs into a big deficit problem. This can be explained by Figs. 15(b) and  17(b), which reveals that while QDHL shutdown more servers and it always allows all gamers meets 60+% QoE which is the just-good-enough QoE level. Fig. 15(c) shows that QDH ′L outperforms LBP up to 130%.
Table 3: Running Time in Seconds
QDHL QDH′L
Mean Max Mean Max
5000 0.215 0.853 0.02 0.05
10000 0.379 0.967 0.05 0.07
15000 0.557 1.9 0.07 0.12
20000 0.819 2.52 0.12 0.23
Scalability. We plot the running time in Fig. 16, which shows the QDHL/QDH′L algorithms terminate in real time: < 1.5 ms. We then increase the number of servers S, and report the average running time with two assumptions that all gamers which we get from WoW trace will not leave the game session and we repeat the WoW trace 30 times to scale up the number of total gamers in our system. Table 3 shows that it takes QDHL/QDH′L at most 2.5s to solve a VM placement problem with more than 20000 servers and 40000 gamers. This is relatively short compared to the initialization time of modern computer games.
Number of gamers. The number of gamers in WoW traces is varying in time, and we present two scatter plots in Figs. 17(a) and 17(b) to study the relation between the performance and number of gamers. These figures show that more gamers lead to higher profits and lower QoE levels, and QDHL/QDH′L successfully achieve their design objectives.

8  Conclusion and Future Work

We studied the VM placement problems for maximizing: (i) the total net profit for service providers while maintaining just-good-enough gaming QoE, and (ii) the overall gaming QoE for gamers. The former problem is more suitable for public cloud gaming systems, while the later problem is more suitable for closed systems. We conducted extensive experiments using a real cloud gaming system [14], and two VMs to derive various system models. We formulated the two problems as optimization problems, and proposed optimal and efficient algorithms to solve them. Via testbed and extensive trace-driven simulations, we demonstrate that: (i) the efficient algorithms achieve up to 90% (provider-centric) and 100% (gamer-centric) performance compared to the optimal algorithms, (ii) the efficient algorithms constantly outperform the state-of-the-art algorithm, e.g., up to 3.5 times in net profits, and (iii) the efficient algorithms terminate in < 2.5 s on a commodity PC even for large services with 20000 servers and 40000 gamers.
This work can be extended in several directions. For example, we may develop more comprehensive system models, which may take other types of resources and heterogeneous server types into consideration, and support online parameter adaptation.

Appendix: Virtualization

Many different virtualization technologies, such as JVM, virtual storage, virtual machines, have been proposed in the literature [39,40,41,42]. These technologies can be roughly classified into the following 6 kinds:
In Fig. 18, we sort these virtualization techniques according to their level of virtualization, where techniques on the right have higher virtualization levels. The aforementioned six techniques can be further grouped into two types, as illustrated in Fig. 19. The first type (Fig. 19(a)) enables multiple guests running on the same OS kernel; therefore, the kernel calls from different guests may affect each other. The AV, RV, and OSLV belong to this type. The second type (Fig. 19(b)) isolates guests from each other by running each guest in its own OS.
Our cloud gaming testbed is built upon type-2 HVM solutions, such as VMware, which are more flexible at the expense of slightly higher virtualization overhead. To reduce the virtualization overhead, type-1 solutions may be adopted, which however may require more customizations in game software.
virtualization_level.png
Figure 18: Levels of virtualization.
type1.png (a) type2.png (b)
Figure 19: Two types of virtualization: (a) multiple guests with a single operating system and (b) multiple guests with individual operating systems.

References

[1] H.-J. Hong, D.-Y. Chen, C.-Y. Huang, K.-T. Chen, and C.-H. Hsu, "QoS-aware virtual machine placement for cloud games," in Proc. of ACM Annual Workshop on Network and Systems Support for Games (NetGames'13), Denver, CO, December 2013.
[2] P. Ross, "Cloud computing's killer app: Gaming," IEEE Spectrum, vol. 46, no. 3, p. 14, March 2009.
[3] "Distribution and monetization strategies to increase revenues from cloud gaming," http://www.cgconfusa.com/report/documents/Content-5minCloudGamingReportHighlights.pdf.
[4] "Cloud gaming adoption is accelerating ... and fast!" http://www.nttcom.tv/2012/07/09/cloud-gaming-adoption-is-acceleratingand-fast/.
[5] "OnLive launches new company to avoid bankruptcy," http://techland.time.com/2012/08/20/onlive.
[6] F. Bari, R. Boutaba, R. Esteves, M. Podlesny, G. Rabbani, Q. Zhang, F. Zhani, and L. Granville, "Data center network virtualization: A survey," IEEE Communications Surveys & Tutorials, vol. 15, no. 2, pp. 909 - 928, 2012, accepted to appear.
[7] M. Yu, Y. Yi, J. Rexford, and M. Chiang, "Rethinking virtual network embedding: Substrate support for path splitting and migration," ACM SIGCOMM Computer Communication Review, vol. 38, no. 2, pp. 17-29, April 2008.
[8] X. Cheng, S. Su, Z. Zhang, H. Wang, F. Yang, Y. Luo, and J. Wang, "Virtual network embedding through topology-aware node ranking," ACM SIGCOMM Computer Communication Review, vol. 41, no. 2, pp. 38-47, April 2011.
[9] X. Meng, V. Pappas, and L. Zhang, "Improving the scalability of data center networks with traffic-aware virtual machine placement," in Proc. of IEEE INFOCOM 2010, San Diego, CA, March 2010, pp. 1-9.
[10] N. Chowdhury, M. Rahman, and R. Boutaba, "Virtual network embedding with coordinated node and link mapping," in Proc. of IEEE INFOCOM 2009, Rio de Janeiro, Brazil, April 2009, pp. 783-791.
[11] Y.-T. Lee, K.-T. Chen, H.-I. Su, and C.-L. Lei, "Are All Games Equally Cloud-Gaming-Friendly? An Electromyographic Approach," in Proc. of the ACM SIGCHI International Conference on Advances in Computer Entertainment Technology (ACE'05), October 2012, pp. 117-124.
[12] S. Shi, K. Nahrstedt, and R. Campbell, "Distortion over latency: Novel metric for measuring interactive performance in remote rendering systems," in Proc. of IEEE International Conference on Multimedia and Expo (ICME'11), Barcelona, Spain, July 2011, pp. 1-6.
[13] P. Chen and M. Zark, "Perceptual view inconsistency: An objective evaluation framework for online game quality of experience (QoE)," in Proc. of the Annual Workshop on Network and Systems Support for Games (NetGames'11), Ottawa, Canada, October 2011, pp. 2:1-2:6.
[14] C.-Y. Huang, K.-T. Chen, D.-Y. Chen, H.-J. Hsu, and C.-H. Hsu, "GamingAnywhere: The First Open Source Cloud Gaming System," ACM Transactions on Multimedia Computing Communications and Applications, pp. 1-25, Jan 2014.
[15] S. Zaman and D. Grosu, "A combinatorial auction-based mechanism for dynamic VM provisioning and allocation in clouds," IEEE Transactions on Cloud Computing, vol. 1, no. 2, pp. 129-141, July-December 2013.
[16] J. Lin, C. Chen, and J. Chang, "QoS-aware data replication for data-intensive applications in cloud computing systems," IEEE Transactions on Cloud Computing, vol. 1, no. 1, pp. 101-115, July-December 2013.
[17] M. Marzolla, O. Babaoglu, and F. Panzieri, "Server consolidation in clouds through gossiping," in Proc. of International Symposium on World of Wireless, Mobile and Multimedia Networks (WoWMoM'11), Lucca, Italy, June 2011, pp. 1-6.
[18] T. Ferreto, M. Netto, R. Calheiros, and C. Rose, "Server consolidation with migration control for virtualized data centers," Future Generation Computer Systems, vol. 27, no. 8, pp. 1027-1034, October 2011.
[19] M. Chen, H. Zhang, Y. Su, X. Wang, G. Jiang, and K. Yoshihira, "Effective VM sizing in virtualized data centers," in Proc. of International Symposium on Integrated Network Management (IM'11), Dublin, Ireland, May 2011, pp. 594-601.
[20] B. Speitkamp and M. Bichler, "A mathematical programming approach for server consolidation problems in virtualized data centers," IEEE Transactions on Services Computing, vol. 3, no. 4, pp. 266-278, December 2010.
[21] Q. Zhu and T. Tung, "A performance interference model for managing consolidated workloads in QoS-aware clouds," in IEEE International Conference on Cloud Computing (CLOUD'12), Honolulu, HI, June 2012, pp. 170-179.
[22] R. Nathuji, A. Kansal, and A. Ghaffarkhah, "Q-clouds: Managing performance interference effects for QoS-aware clouds," in Proc. of the European Conference on Computer systems (EuroSys'10), Paris, France, April 2010, pp. 237-250.
[23] Y. Lee and K. Chen, "Is Server Consolidation Beneficial to MMORPG? A Case Study of World of Warcraft," in Proc. of IEEE International Conference on Cloud Computing (CLOUD'10), Miami, FL, February 2010, pp. 435 - 442.
[24] T. Duong, X. Li, R. Goh, X. Tang, and W. Cai, "QoS-aware revenue-cost optimization for latency-sensitive services in IaaS clouds," in Proc. of IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications (DS-RT'12), Dublin, Ireland, October 2012.
[25] D. Wu, Z. Xue, and J. He, "iCloudAccess: Cost-effective streaming of video games from the cloud with low latency," IEEE Transactions on Circuits and Systems for Video Technology, January 2014, accepted to appear.
[26] K.-T. Chen, Y.-C. Chang, P.-H. Tseng, C.-Y. Huang, and C.-L. Lei, "Measuring The Latency of Cloud Gaming Systems," in Proc. of ACM International Conference on Multimedia (MM'11), Scottsdale, AZ, November 2011, pp. 1269-1272.
[27] M. Kutner, C. Nachtsheim, J. Neter, and W. Li, Applied linear statistical models, 5th ed.    McGraw-Hill Higher Education, 2004.
[28] M. Claypool, "Motion and scene complexity for streaming video games," in Proc. of the International Conference on Foundations of Digital Games (FDG'09), Port Canaveral, FL, April 2009, pp. 34-41.
[29] C. Mark and C. Kajal, "Latency and player actions in online games," Communications of the ACM, vol. 49, no. 11, pp. 40-45, November 2006.
[30] K. Gummadi, S. Saroiu, and S. Gribble, "King: Estimating latency between arbitrary Internet end hosts," in Proc. of ACM SIGCOMM Internet Measurement Workshop (IMW'02), Boston, MA, November 2002, pp. 5-18.
[31] "IBM ILOG CPLEX optimizer," http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/.
[32] C. Clark, K. Fraser, S. Hand, J. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield, "Live migration of virtual machine," Symposium on Networked Systems Design and Implementation, vol. 2, pp. 273-286, 2005.
[33] "DummyNet," http://info.iet.unipi.it/ luigi/dummynet/.
[34] "DigSites," http://digsitesvalue.net/s/onlive.com.
[35] "libtorrent," http://www.rasterbar.com/products/libtorrent/.
[36] "ip2c," http://firestats.cc/wiki/ip2c.
[37] "IsoHunt," http://isohunt.com/.
[38] "War of Warcraft avatar history dataset," http://mmnet.iis.sinica.edu.tw/dl/wowah/.
[39] Y. Li, W. Li, and C. Jiang, "A survey of virtual machine system: Current technology and future trends," in Proc. of International Symposium on Electronic Commerce and Security (ISECS'10), Guangzhou, China, July 2010, pp. 332-336.
[40] S. Nanda and T. Chiueh, "A survey of virtualization technologies," Tech. Rep., February 2005, www.ecsl.cs.sunysb.edu/tr/TR179.pdf.
[41] R. Rose, "Survey of system virtualization techniques," Tech. Rep., March 2004, http://www.robertwrose.com/vita/rose-virtualization.pdf.
[42] S. Osman, D. Subhraveti, G. Su, and J. Nieh, "The design and implementation of zap: a system for migrating computing environments," ACM SIGOPS Operating Systems Review, vol. 36, no. SI, pp. 361-376, December 2002.

Footnotes:

1. A preliminary version of this manuscript [1] appeared in the Proceedings of the 12th Annual Workshop on Network and Systems Support for Games (NetGames 2013) as a 2-page short paper.


Sheng-Wei Chen (also known as Kuan-Ta Chen)
http://www.iis.sinica.edu.tw/~swc 
Last Update July 20, 2017