GamingAnywhere-The First Open Source Cloud Gaming System

Chun-Ying Huang Kuan-Ta Chen DE-YU CHEN Hwai-Jung Hsu Cheng-Hsin Hsu

PDF Version | Contact Us


is now publicly available at http://gaminganywhere.org

Abstract

We present the first open source cloud gaming system, called GamingAnywhere. In addition to its openness, we design GamingAnywhere for high extensibility, portability, and reconfigurability. We implement GamingAnywhere on Windows, Linux, OS X, and Android. We conduct extensive experiments to evaluate the performance of GamingAnywhere. Our experimental results indicate that GamingAnywhere is efficient, scalable, adaptable to network conditions, and achieves high responsiveness and streaming quality. GamingAnywhere can be employed by the researchers, game developers, service providers, and end users for setting up cloud gaming testbeds, which, we believe, will stimulate more research innovations on cloud gaming systems and applications.
Author's address: C.-Y. Huang, 2 Pei-Ning Road Keelung, Taiwan 20224; email: chuang@ntou.edu.tw; K.-T. Chen, D.-Y. Chen, H.-J. Hsu, 128 Academia Road, Section 2, Nankang, Taipei 11574; email: swc@iis.sinica.edu.tw, r96922083@ntu.edu.tw, hjhsu@iis.sinica.edu.tw; C.-H. Hsu, No. 101, Section 2, Kuang-Fu Road, Hsinchu, Taiwan 30013; email: chsu@cs.nthu.edu.tw

1  Introduction

Cloud gaming systems render the game scenes on cloud servers and stream the encoded game scenes to thin clients over the Internet. The thin clients send the user inputs, from joysticks, keyboards, and mice, to the cloud servers. With cloud gaming systems, users can: (i) avoid upgrading their computers, for the latest games, (ii) play the same games using thin clients on different platforms, such as PCs, laptops, tablets, and smartphones, and (iii) play more games due to reduced hardware/software cost, while game developers may: (i) support more platforms, (ii) avoid hardware/software incompatibility issues, and (iii) increase net revenues. Therefore, cloud gaming systems have attracted attentions from users, game developers, and service providers. In fact, the market potential of cloud gaming is well recognized as evidenced by the recent acquisitions of cloud gaming startups, such as GaiKai [ Sony2012] and ESN [ EA2012].
For commercially-successful cloud gaming services, the cloud gaming systems must deliver high-quality videos with low response delay, which is difficult in the best-effort Internet. The response delay refers to the time difference between the thin client receiving a user input and the thin client displaying the game frame reflecting that user input. Higher quality videos, such as 720p (1280x720) at 60 fps (frame-per-second), inherently lead to higher bit rate, which render cloud gaming systems more vulnerable to network congestion, and thus longer response delay. Longer response delay results in worse user experiences, and may turn the users away from cloud gaming services. In fact, user studies reveal that users may quit playing networked games if the response delay is longer than a genre-dependent threshold, as low as 100 ms [ Claypool and ClaypoolClaypool and Claypool2006]. Considering that game scenes have to go through a pipeline of rendering, capturing, encoding, packetization, transmitting, decoding, and displaying, it is very challenging to design and implement a cloud gaming system for both high video quality and low response delay.
Remote desktop software packages, such as LogMeIn [ LogMeIn2012], TeamViewer [ TeamViewer2012], and UltraVNC [ UltraVNC2012], have been popular for some time, but were not designed for highly interactive applications, and thus do not meet the strict requirements of cloud gaming [ Chang, Tseng, Chen, and LeiChang et al .2011]. Although there exist commercial cloud gaming services, e.g., GaiKai [ GaiKai2012], OnLive [ OnLive2012], and StreamMyGame [ StreamMyGame2012], a recent measurement study [ Chen, Chang, Tseng, Huang, and LeiChen et al .2011] reports that these cloud gaming systems still suffer from high response delay, among other limitations. For example, assuming a small, negligible network latency, 134 ms and 375 ms mean response delay are measured on the OnLive and StreamMyGame platforms respectively. Hence, the problem of developing cloud gaming systems for high video quality and low response delay remains open. We consider a major cause of the inferior performance of existing cloud gaming systems to be the lack of an open source cloud gaming system, which would enable research groups to readily implement and evaluate their new ideas for better cloud gaming experiences.
In this article, we present our efforts on designing, implementing, and evaluating GamingAnywhere, which is to the best of our knowledge, the first open source cloud gaming system. GamingAnywhere has three main advantages over other existing systems.
  1. GamingAnywhere is an open system, in the sense that a component of the video streaming pipeline can be easily replaced by another component implementing a different algorithm, standard, or protocol. For example, GamingAnywhere by default uses x264 [ x2642012] and vpxenc [ WebM2013] to encode captured raw videos. To expand GamingAnywhere for stereoscopic games, an H.264/MVC encoder may be plugged into it without significant changes. More generally, various algorithms, standards, protocols, and system parameters can be rigorously evaluated using real experiments, which is impossible on proprietary cloud gaming systems.
  2. GamingAnywhere is cross-platform, and is currently available on Windows, Linux, OS X, and Android (client only). This is made possible largely due to the modularized design of GamingAnywhere.
  3. GamingAnywhere has been designed to be efficient, as can be seen, for example, in its minimizing of time and space overhead by using shared circular buffers to reduce the number of memory copy operations. These optimizations allow GamingAnywhere to provide a high-quality gaming experience with short response delay. In particular, on a commodity Intel i7 server, GamingAnywhere delivers real-time 720p videos at ≥ 35 fps, which is equivalent to less than 28.6 ms of processing time for each video frame, with a video quality significantly higher than that of existing cloud gaming systems. In particular, GamingAnywhere outperforms OnLive by 5 dB in Peak Signal-to-Noise Ratio (PSNR).
This article makes two main contributions. First, we develop an open cloud gaming system, GamingAnywhere, which can be used by cloud gaming developers, cloud service providers, and system researchers for setting up a complete cloud gaming testbed. GamingAnywhere is the first open cloud gaming testbed in the literature. Second, we conduct extensive experiments using GamingAnywhere to quantify its performance and overhead. We also derive the optimal setups of system parameters, which in turn allow users to install and try out GamingAnywhere on their own servers.

1.1  Design Objectives

GamingAnywhere aims to provide an open platform for researchers to develop and study real-time multimedia streaming applications in the cloud. Its objectives are as follows.
  1. Extensibility: GamingAnywhere adopts a modularized design. Both platform-dependent components such as audio and video capturing and platform-independent components such as codecs and network protocols can be easily modified or replaced. Developers can follow the programming interfaces of modules in GamingAnywhere to extend the capabilities of the system. GamingAnywhere is not limited only to games, and any real-time multimedia streaming application such as live casting can be done using the same system architecture.
  2. Portability: In addition to desktop computers, mobile devices are now becoming one of the most potential clients of cloud services because of the widespread of wireless access and the limited resources available on mobile devices. For this reason, we maintain the principle of portability when designing and implementing GamingAnywhere. Currently the server supports Windows, Linux, and OS X, while the client supports Windows, Linux, OS X, and Android. New platforms can be easily included by replacing platform-dependent components in GamingAnywhere. Besides the easily replaceable modules, the external components leveraged by GamingAnywhere are highly portable as well. This also makes GamingAnywhere easier to be ported to mobile devices. For these details please refer to Section 4.
  3. Configurability: A system researcher may conduct experiments for real-time multimedia streaming applications with diverse system parameters. A large number of built-in audio and video codecs are supported by GamingAnywhere. In addition, GamingAnywhere exposes all available configurations to users so that it is possible to try out the best combinations of parameters by simply editing a text-based configuration file and fitting the system into a customized usage scenario.
  4. Openness: GamingAnywhere is publicly available at http://gaminganywhere.org/. Use of GamingAnywhere in academic research is free of charge but researchers and developers should follow the license terms claimed in the binary and source packages.

1.2  Paper Organization

The rest of this paper is organized as follows. Section 2 discusses the related work in the literature. Section 3 depicts the system architecture. This is followed by the detailed elaborations of implementation issues in Section 4. Section 5 gives the experimental results. We conclude the paper in Section 6. Last, we empirically determine the best encoding parameters in Appendix A.

2  Related Work

We first survey the existing cloud gaming systems. Then, we review prior works on quantifying the performance of cloud gaming systems.

2.1  Cloud Gaming Systems

We classify the cloud gaming systems into three genres: (i) 3D graphics streaming [ Eisert and FechtelerEisert and Fechteler2008, Jurgelionis, Fechteler, Eisert, Bellotti, David, Laulajainen, Carmichael, Poulopoulos, Laikari, Perala, Gloria, and BourasJurgelionis et al .2009], (ii) video streaming [ Winter, Simoens, Deboosere, Turck, Moreau, Dhoedt, and DemeesterWinter et al .2006, Holthe, Mogstad, and RonningenHolthe et al .2009], and (iii) video streaming with post-rendering operations [ Shi, Hsu, Nahrstedt, and CampbellShi et al .2011, Giesen, Schnabel, and KleinGiesen et al .2008]. These three approaches differ from one another in how they divide the workload between the cloud servers and clients.
With the 3D graphics streaming approach [ Eisert and FechtelerEisert and Fechteler2008, Jurgelionis, Fechteler, Eisert, Bellotti, David, Laulajainen, Carmichael, Poulopoulos, Laikari, Perala, Gloria, and BourasJurgelionis et al .2009], the cloud servers intercept the graphics commands, compress the commands, and stream them to the clients. The clients then render the game scenes using its graphics chips based on graphics command sets such as OpenGL and Direct3D. The clients' graphics chips must be not only compatible with the streamed graphics commands but also powerful enough to render the game scenes in high quality and real time. Because this approach imposes more workload on the clients, it is less suitable for resource-constrained devices, such as mobile devices and set-top boxes.
In contrast, with the video streaming approach [ Winter, Simoens, Deboosere, Turck, Moreau, Dhoedt, and DemeesterWinter et al .2006, Holthe, Mogstad, and RonningenHolthe et al .2009] the cloud servers render the 3D graphics commands into 2D videos, compress the videos, and stream them to the clients. The clients then decode and display the video streams. The decoding can be done using low-cost video decoder chips massively produced for consumer electronics. This approach relieves the clients from computationally-intensive 3D graphics rendering and is ideal for thin clients on resource-constrained devices. Since the video streaming approach does not rely on specific 3D chips, the same thin clients can be readily ported to different platforms, which are potentially GPU-less.
The approach of video streaming with post-rendering operations [ Shi, Hsu, Nahrstedt, and CampbellShi et al .2011, Giesen, Schnabel, and KleinGiesen et al .2008] is somewhere between the 3D graphics streaming and video streaming. While the 3D graphics rendering is performed at the cloud servers, some post-rendering operations are optionally done on the thin clients for augmenting motions, lighting, and materials using local resources [ Chen, Chang, and MaChen et al .2010]. These post-rendering operations have low computational complexity and run in real time even without GPUs.
Similar to the proprietary cloud gaming systems, the proposed GamingAnywhere employs the video streaming approach for lower loads on the thin clients. Differing from other systems [ Winter, Simoens, Deboosere, Turck, Moreau, Dhoedt, and DemeesterWinter et al .2006, Holthe, Mogstad, and RonningenHolthe et al .2009] in the literature, GamingAnywhere is open, modularized, cross-platform, and efficient. In fact, it is the first complete system of its kind, and is of interests for researchers, game developers, service providers, and end users. The initial version of GamingAnywhere [ Huang, Hsu, Chang, and ChenHuang et al .2013] was introduced to the public in February 2013. Since then we have improved the system from several aspects: We have revised the mechanisms for video frame capture and user control event interception to improve the overall system performance. The architecture is now more neutral to platform-specific details so that GamingAnywhere supports exactly the same functionality on all the supported platforms. Further, we have integrated a number of new video encoders with GamingAnywhere and provide measurement studies based on the popular VP8 video encoder. GamingAnywhere is getting more matured and now researchers and developers are able to integrate the proposed system with their preferred flavors - as an standalone application, as as a static library, or as a dynamically-linked shared object.

2.2  Measuring the Performance of Cloud Gaming Systems

Measuring the performance of desktop streaming systems has long been considered in the literature [ Lai and NiehLai and Nieh2006, Nieh, Yang, and NovikNieh et al .2003, Wong and SeltzerWong and Seltzer1999, Packard and GettysPackard and Gettys2003, Tolia, Andersen, and SatyanarayananTolia et al .2006]. The slow-motion benchmarking [ Lai and NiehLai and Nieh2006, Nieh, Yang, and NovikNieh et al .2003] runs a slow-motion version of an application on the server, and analyzes network packets between the server and thin client. Slow-motion benchmarking augments the execution speed of applications, and is thus less suitable to real-time applications, including cloud games. Packard and Gettys [ Packard and GettysPackard and Gettys2003] analyze the network traces between the X-Window server and client under diverse network conditions. The traces are used to compare the compression ratios of different compression mechanisms, and to quantify the effects of network impairments. Wong and Seltzer [ Wong and SeltzerWong and Seltzer1999] measure the performance of the Windows NT Terminal Service, in terms of process, memory, and network bandwidth. The Windows NT Terminal Service is found to be generally efficient with multi-user access, but the response delay increases when the system load is high. Tolia et al. [ Tolia, Andersen, and SatyanarayananTolia et al .2006] quantify the performance of several applications running on a VNC server, which is connected to a VNC thin-client via a network with diverse round-trip time (RTT). It is determined that the response delay of these applications highly depends on the degree of the application's interactivity and network RTT. The performance metrics considered in [ Lai and NiehLai and Nieh2006, Nieh, Yang, and NovikNieh et al .2003, Wong and SeltzerWong and Seltzer1999, Packard and GettysPackard and Gettys2003, Tolia, Andersen, and SatyanarayananTolia et al .2006] are only suitable to desktop streaming systems, which do not impose strict real-time requirements.
More recently, a few studies measure the performance of remote desktop streaming and cloud gaming systems. Chang et al.'s [ Chang, Tseng, Chen, and LeiChang et al .2011] methodology to study the performance of games on desktop streaming systems has been employed to evaluate several implementations, including LogMeIn [ LogMeIn2012], TeamViewer [ TeamViewer2012], and UltraVNC [ UltraVNC2012]. Chang et al. establish that player performance and Quality-of-Experience (QoE) depend on video quality and frame rates. It is observed that the desktop streaming systems cannot support cloud games given that the achieved frame rate is as low as 9.7 fps [ Chang, Tseng, Chen, and LeiChang et al .2011]. Chen et al. [ Chen, Chang, Tseng, Huang, and LeiChen et al .2011] propose another methodology to quantify the response delay, which is even more critical to cloud games [ Claypool and ClaypoolClaypool and Claypool2006, HendersonHenderson2003, Zander, Leeder, and ArmitageZander et al .2005]. Two proprietary cloud gaming systems, OnLive [ OnLive2012] and StreamMyGame [ StreamMyGame2012], are evaluated using this methodology. Their evaluation results reveal that StreamMyGame suffers from a high response delay, while OnLive achieves reasonable response delay. Chen et al. [ Chen, Chang, Tseng, Huang, and LeiChen et al .2011] point out that the performance edge of OnLive can be attributed to its customized hardware platform and optimized game software. In addition, Lee et al. [ Lee, Chen, Su, and LeiLee et al .2012] evaluate whether computer games are equally suitable to the cloud gaming setting and find that some games are more "compatible" with cloud gaming than others. Meanwhile, Choy et al. [ Choy, Wong, Simon, and RosenbergChoy et al .2012] evaluate whether a large-scale cloud gaming infrastructure is feasible on the current Internet and propose a smart-edge solution to mitigate user-perceived delays when playing on the cloud.
In light of the literature review, the current paper tackles the following question: Can we do better than OnLive using commodity desktops and unmodified game software? We employ the measurement methodologies proposed in [ Chen, Chang, Tseng, Huang, and LeiChen et al .2011] to compare the proposed GamingAnywhere against OnLive [ OnLive2012].
bigpic.png
Figure 1: The deployment scenario of GamingAnywhere.
eps/arch.png
Figure 2: The server and the client architecture of GamingAnywhere.

3  System Architecture

Figure 1 presents the high-level deployment scenario of GamingAnywhere. A user first logs into the system via a portal server, which provides a list of available games to the user. The user then selects a preferred game and requests to play the game. Upon receipt of the request, the portal server finds an available game server, launches the selected game on the server, and returns the game server's URL to the user. Finally, the user connects to the game server and starts to play. The portal server is a web service providing login and game-selection user interface. If login and game-selection requests are sent from a customized client, the portal server does not even need a fancy user interface. Actions can be sent as REST-like [ FieldingFielding2000, CostelloCostello2007] requests via standard HTTP or HTTPS protocols. The design of the portal server is out of the scope of this article.
Figure 2 shows the architecture of the game server and the game client of GamingAnywhere. We define two types of network flows in the architecture, the data flow and the control flow. Whereas the data flow is used to stream audio and video (A/V) frames from the server to the client, the control flow runs in a reverse direction, being used to send the user's actions from the client to the server. The system architecture of GamingAnywhere allows it to support both PC- and Web-based games. The game selected by a user runs on a game server. There is an agent running along with the selected game on the same server. The agent can be a stand-alone process or a module (in the form of shared object or DLL) injected into the selected game. The choice depends on the type of the game and how the game is implemented. The agent has two major tasks. The first task is to capture the A/V frames of the game, encode the frames using the chosen codecs, and then deliver the encoded frames to the client via the data flow. The second task of the agent is to interact with the game. On receipt of the user's actions from the client, it must behave as the user and play with the game by re-playing the received keyboard, mouse, joysticks, and even gesture events. However, as there exist no standard protocols for delivering users' actions, we design and implement the transport protocol of user actions by ourselves.
The client is a customized game console implemented by combining an RTSP/RTP multimedia player and a keyboard/mouse logger. The system architecture of GamingAnywhere allows observers1 by nature because the server delivers encoded A/V frames using the standard RTSP and RTP protocols. In this way, an observer can watch a game play by simply accessing the corresponding game URL with full-featured multimedia players, such as the VLC media player [ VideoLANVideoLAN], which are available on almost all OS's and platforms.

4  Implementation

The implementation of GamingAnywhere includes server and client, each of which contains a number of modules whose details are elaborated in this section. GamingAnywhere leverages several external libraries including libavcodec/libavformat [ FFmpeg projectFFmpeg project], live555 [ Live NetworksLive Networks], and SDL library [ LantingaLantinga]. The libavcodec/libavformat library is part of the ffmpeg project, which is a package to record, convert, and stream audio and video. We use this library to encode and decode the A/V frames on both the server and the client. In addition, it is also used to handle the RTP protocol at the server. The live555 library is a set of C++ libraries for multimedia streaming using open standard protocols (RTSP, RTP, RTCP, and SIP). We use this library to handle RTSP/RTP protocols [ Schulzrinne, Rao, and LanphierSchulzrinne et al .1998, Schulzrinne, Casner, Frederick, and JacobsonSchulzrinne et al .2003] at the client. The Simple DirectMedia Layer (SDL) library is a cross-platform multimedia library designed to provide low-level access to audio, keyboard, mouse, joystick, 3D hardware via OpenGL and a 2D video frame buffer. We use this library to render audio and video at the client. All the above libraries have been ported to a number of platforms, including Windows, Linux, OS X, iOS, and Android.
[tp]
eps/servermodules.png
Figure 3: The relationships among server modules, shared buffers, and network connections.

4.1  GamingAnywhere Server

The relationships among server modules are depicted in Figure 3. Some of the modules are implemented in separate threads. When an agent is launched, its four modules, i.e., the RTSP server, audio source, video source, and input replayer are launched as well. The RTSP server and the input replayer modules are immediately started to wait for incoming clients (starting from the path 1n and 1i in the figure). The audio source and the video source modules are kept idle after initialization. When a client is connected to the RTSP server, the encoder threads are launched and an encoder must notify the corresponding source module that it is ready to encode the captured frames. The source modules then start to capture audio and video frames when one or more encoders are ready to work. Encoded audio and video frames are generated concurrently in real time. The data flows of audio and video frame generations are depicted as the paths from 1a to 5a and from 1v to 5v, respectively. The details of each module are explained respectively in the following subsections.

4.1.1  RTSP, RTP, and RTCP Server

The RTSP server thread is the first thread launched in the agent. It accepts RTSP commands from a client, launches encoders, and setups data flows for delivering encoded frames. The data flows can be conveyed by a single network connection or multiple network connections depending on the preferred transport layer protocol, i.e., TCP or UDP. In the case of TCP, encoded frames are delivered as interleaved binary data in RTSP [ Schulzrinne, Rao, and LanphierSchulzrinne et al .1998], hence necessitating only one data flow network connection. Both RTSP commands and RTP/RTCP packets are sent via the RTSP over TCP connection established with a client. In the case of UDP, encoded frames are delivered based on the RTP over UDP protocol. Three network flows are thus required to accomplish the same task: In addition to the RTSP over TCP connection, two RTP over UDP flows are used to deliver encoded audio and video frames, respectively.
We implement the mechanisms for handling RTSP commands and delivering interleaved binary data by ourselves, while using the libavformat library to do the packetization of RTP packets. If encoded frames are delivered as interleaved binary data, the frames are filled into a raw RTP packet and sent using the existing RTSP stream. On the other hand, if encoded frames are delivered via RTP over UDP, they are sent directly to the client via libavformat calls.
The RTSP server thread exports a programming interface for encoders to send encoded frames. Therefore, whenever an encoder generates an encoded frame, it can send out the frame to the client via the interface without knowing the details about the underlying network connections. The UML diagram of the RTSP/RTP protocol and involved server components is shown in Figure 4. A client connects to the RTSP server, requests for the codec and channel information with the DESCRIBE command, sets up transport layer protocols for delivery of audio and video data with the SETUP command, and then starts receiving audio and video data after the PLAY command. The audio and video encoders are launched by the RTSP server on receipt of the PLAY command from the client. Meanwhile, the audio and the video sources also start to capture and feed audio and video frames to the corresponding encoders, which are responsible to encode and send encoded data to the client. The audio and video encoders are terminated when the client tears down the session or is disconnected.
eps/netuml.png
Figure 4: The UML diagram for the RTSP/RTP protocol and the involved GamingAnywhere server components.

4.1.2  Video Source

Capturing of game screens (frames) is a platform-dependent task. We currently provide two implementations of the video source module to capture the game screens in real time. One implementation is called the desktop capture module, which captures the entire desktop screen on demand, and extracts the desired region if necessary. Another implementation is called the API intercept module, which intercepts a game's graphics drawing function calls and captures the screen directly from the game's back buffer [ MicrosoftMicrosoft2012] immediately whenever the rendering of a new game screen is completed.
Given a desired frame rate (commonly expressed in frame-per-second, fps), the two implementations of the video source module work in different ways. The desktop capture module is triggered in a polling manner; that is, it actively takes a screenshot of the desktop at a specified frequency. For example, if the desired frame rate is 24 fps, the capture interval will be 1/24 sec ( ≈ 41.7 ms). By using a high-resolution timer, we can keep the rate of screen captures approximately equal to the desired frame rate. On the other hand, the API intercept module works in an event-driven manner. Whenever a game completes the rendering of an updated screen in the back buffer, the API intercept module will have an opportunity to capture the screen for streaming. Because this module captures screens in an opportunistic manner, we use a token bucket rate controller [ TanenbaumTanenbaum2002] to decide whether our module should capture the screen in order to achieve the desired streaming frame rate. For example, assuming a game updates its screen 100 times per second and the desired frame rate is 50 fps, the API intercept module will only capture one game screen for every two screen updates. However, if the game's frame rate is lower than the desired rate, the module simply captures game screens at the same rate of the game renderer. Each captured frame is associated with a timestamp, which is a zero-based sequence number. Captured frames along with their timestamps are stored in a shared buffer owned by the video source module and shared with a video encoder. The video source module serves as the only buffer writer, while the video encoder is the buffer reader. Therefore, a reader-writer lock must be acquired every time before accessing the shared buffer.
At present, the desktop capture module is implemented in Linux and Windows. We use the MIT-SHM extension for the X-Window system to capture the desktop on Linux and use GDI to capture the desktop graphics on Windows. As for the API intercept module, it currently supports DirectDraw, Direct3D, and SDL games on Windows and SDL games on Linux. Both modules support captured frames of pixel formats in RGBA, BGRA, and YUV420P, with a high extensibility to incorporate other pixel formats for future needs.

4.1.3  Audio Source

Capturing of audio frames is a platform-dependent task as well. In our implementation, we use the ALSA library and Windows audio session API (WASAPI) to capture sound on Linux and Windows, respectively. The audio source module regularly captures audio frames (also called audio packets) from an audio device (normally the default waveform output device). The captured frames are copied by the audio source module to a buffer shared with the encoder. The encoder will be awakened each time an audio frame is generated to encode the new frame. To simplify the programming interface of GamingAnywhere, we require each sample of audio frames to be stored as a 32-bit signed integer.
One issue that an audio source module must handle is the frame discontinuity problem. When there is no application generating any sound, the audio read function may return either 1) an audio frame with all zeros, or 2) an error code indicating that no frames are currently available. If the second case, an audio source module needs to still emit silence audio frames to the encoder because encoders normally expect continuous audio frames no matter whether audible sound is present or not. Therefore, an audio source module must generate silence audio frames in the second case to resolve the frame discontinuity problem. We observed that modern Windows games often play audio using WASAPI, which suffers from the frame discontinuity problem. Our WASAPI-based audio source module has overcome the problem by carefully estimating the duration of silence periods and generating silence frames accordingly, as illustrated in Figure 5. From the figure, the length of the silence frame should ideally be t1−t0; however, the estimated silence duration may be slightly longer or shorter if the timer accuracy is not sufficiently high.
eps/silence.png
Figure 5: Sample audio signals that may cause the frame discontinuity problem.

4.1.4  Frame Encoding

Audio and video frames are encoded by two different encoder modules, which are launched when there is at least one client connected to the game server. GamingAnywhere currently supports an one-encoder-for-all mode. In this mode, the frames generated by a frame source are only read and encoded by one encoder regardless of the number of observers2. Therefore, a total of two encoders, one for video frames and another for audio frames, are in charge of encoding tasks. The benefit of this mode is better efficiency as the CPU usage does not increase when there are more observers. All the video and audio frames are encoded only once and the encoded frames are delivered to the corresponding clients in a unicast manner. In case that a demand on providing different stream qualities for different observers in the one-encoder-for-all mode, we would suggest to adopt a multimedia transcoder sit in-between the game server and the observers.
Presently, both the video and audio encoder modules are implemented using the libavcodec library, which is part of the ffmpeg project. The libavcodec library supports various audio and video codecs and is completely written in C. Therefore, GamingAnywhere can use any codec supported by libavcodec. In addition, since the libavcodec library is highly extensible, researchers can easily integrate their own code into GamingAnywhere to evaluate its performance in cloud gaming.

4.1.5  Input Handling

The input handling module is implemented as a separate thread. This module has two major tasks: 1) to capture input events on the client, and 2) to replay the events occurring at the client on the game server.
Unlike audio and video frames, input events are delivered via a separated connection, which can be TCP or UDP. Although it is possible to reuse the RTSP connection for sending input events from the client to the server, we decided not to adopt this strategy for three reasons: 1) The delivery of input events may be delayed due to other messages, such as RTCP packets, sent via the same RTSP connection. 2) Data delivery via RTSP connections incurs slightly longer delays because RTSP is text-based and formatting / parsing text is relatively time-consuming. 3) There is no such standard of embedding input events in an RTSP connection. This means that we will need to modify the RTSP library and inevitably make the system more difficult to maintain.
The implementation of the input handling module is intrinsically platform-dependent because the input event structure is OS- and library-dependent. Currently GamingAnywhere supports the four input formats of Windows, X Window, Mac OS X, and SDL. Upon the receipt of an input event3, the input handling module first converts the received event into the format required by the server and sends the event structure to the server. GamingAnywhere replays input events using the SendInput function on Windows, the XTEST extension on Linux, and the CGPostEvent function on Mac OS X. Additionally, with API interception techniques, GamingAnywhere also attempts to replay control events within a game process to shorten the replay latency incurred by the event processing mechanisms of operation systems. While the above replay functions work quite well for most desktop and game applications, some games adopt different approaches for capturing user inputs. For example, the SendInput function on Windows does not work for Batman and Limbo, which are two popular action adventure games. In this case, GamingAnywhere can be configured to use other input replay methods, such as hooking the GetRawInputData function on Windows to "feed" input events whenever the function is called by the games.
eps/clientmodules.png
Figure 6: The relationships among client modules, shared buffers, and network connections.

4.2  GamingAnywhere Client

The client is basically a remote desktop client that displays real-time game screens which are captured at the server and delivered in the form encoded audio and video frames. The relationships among client modules are shown in Figure 6. The GamingAnywhere client contains two worker threads: one is used to handle user inputs (starting from path 1i) and the other is used to render audio and video frames (starting from path 1r). In this section, we divide the discussion on the client design into three parts, i.e., the network protocols, the decoders, and input handling.

4.2.1  RTSP, RTP, and RTCP Clients

In the GamingAnywhere client, we use the live555 library to handle the network communications. The live555 library is entirely written in C++ with an event-driven design. We take advantage of the class framework of live555 and derive from the RTSPClient and MediaSink classes to register callback functions that handle network events. Once the RTSP client has successfully set up audio and video sessions, we create two sink classes to respectively handle the encoded audio and video frames that are received from the server. Both sink classes are inherited from the MediaSink class and the implemented continuePlaying virtual function is called when the RTSP client issues the PLAY command. The continuePlaying function attempts to receive an encoded frame from the server. When a frame is received successfully, the function triggers a callback function that puts the frame in a buffer and decodes the video frame if possible. The continuePlaying function will then be called again to receive the next frame.

4.2.2  Frame Buffering and Decoding

To provide better gaming experience in terms of latency, the video decoder currently does not buffer video frames at all. In other words, the video buffer component in Figure 6 is simply used to buffer packets that are associated with the latest video frame. Because live555 provides us with packet payloads without an RTP header, we detect whether consecutive packets correspond to the same video frame based on the marker bit [ Schulzrinne, Casner, Frederick, and JacobsonSchulzrinne et al .2003] in each packet. That is, if a newly received packet has a zero marker bit (indicating that it is not the last packet associative with a video frame), it will be appended into the buffer; otherwise, the decoder will decode a video frame based on all the packets currently in the buffer, empty the buffer, and place the newly arrived packet in the buffer. Although this zero-buffering strategy may lead to inconsistency in video playback rate when network delays are unstable, it reduces the input-response latency due to video playout to a minimum level. We believe that this design tradeoff yields an overall better cloud gaming experience.
The way GamingAnywhere handles audio frames is different from its handling of video frames. Upon the receipt of audio frames, the RTSP client thread does not decode the frames, but instead simply places all the received frames in a shared buffer (implemented as a FIFO queue). This is because the audio rendering of SDL is implemented using an on-demand approach. That is, to play audio in SDL, a callback function needs to be registered and it is called whenever SDL requires audio frames for playback. The memory address m to fill audio frames and the number of required audio frames n are passed as arguments to the callback function. The callback function retrieves the audio packets from the shared buffer, decodes the packets, and fills the decoded audio frames into the designated memory address m. Note that the callback function must fill exactly n audio frames into the specified memory address as requested. This should not be a problem if the number of decoded frames is more than requested. If not, the function will wait until there are sufficient number of frames. We implement this waiting mechanism using a mutual exclusive lock (mutex). If the RTSP client thread has received new audio frames, it will append the frames to the buffer and also trigger the callback function to read more frames. To produce smooth audio outputs, the audio decoder starts to play when it has buffered at least n audio frames, which is by default 1024 frames per channel (approximately 23 ms for CD-quality stereo audio).

4.2.3  Input Handling

The input handling module on the client has two major tasks. One is to capture input events made by game players, and the other is to send captured events to the server. When an input event is captured, the event structure is sent to the server directly. Nevertheless, the client still has to tell the server the format and the length of a captured input event.
At present, GamingAnywhere supports the mechanism for cross-platform SDL event capturing. In addition, on certain platforms, such as Windows, we provide more sophisticated input capture mechanisms to cover games with special input mechanisms and devices. Specifically, we use the SetWindowsHookEx function with WH_KEYBOARD_LL and WH_MOUSE_LL hooks to intercept low-level keyboard and mouse events. By doing so we perfectly mimic every move of the players' inputs on the game server.

5  Experiment Studies

In this section, we conduct extensive experiments to evaluate GamingAnywhere.

5.1  Setup

eps/topology.png
Figure 7: The network topology of our experiments.
In our lab, we set up a testbed consisting of Windows 7 desktops with Intel 2.67 GHz i7 processors. Figure 7 illustrates the experimental setup, which consists of a server, a client, and a router. We compare the performance of GamingAnywhere against OnLive [ OnLive2012]. The OnLive client connects to the OnLive server over the Internet, while the GamingAnywhere client connects to its server via a LAN. To impose diverse network conditions, we add a FreeBSD router between the client and server, and run dummynet on it to inject constraints of delays, packet losses, and network bandwidths. Although the OnLive server is outside our LAN, we observed that the quality of the path was consistently good throughout the experiments. In particular, its network delay was around 130 ms and the packet loss rate is less than 10−6. Hence, the path between the OnLive server and our client is essentially a communication channel with sufficient bandwidth, zero packet loss rate, and a constant 130 ms latency.
We configure the GamingAnywhere server to use x264 and the encoding parameters recommended in Appendix A. GamingAnywhere runs on UDP by default. We consider three games from three popular genres: action adventure (Batman), first-person shooter (FEAR), and real-time strategy (DOW). Detailed descriptions on the games are given in [ Huang, Hsu, Chang, and ChenHuang et al .2013]. We set the encoding bit rate to be 3 Mbps. The games are streamed at a resolution of 720p with 50 fps (frames per second) unless otherwise specified. The details of the experimental designs and results are given in the rest of this section.

5.2  Responsiveness

We define response delay (RD) as the time difference between a user submitting a command and the corresponding in-game action appearing on the screen. Studies [ Claypool and ClaypoolClaypool and Claypool2006, HendersonHenderson2003, Zander, Leeder, and ArmitageZander et al .2005] report that players of various game genres can tolerate different degrees of RD; for example, first-person shooter game players demand for less than 100 ms RD [ Claypool and ClaypoolClaypool and Claypool2006]. We adopt the RD measurement procedure proposed in [ Chen, Chang, Tseng, Huang, and LeiChen et al .2011], which divides the RD into: Therefore, we have RD=PD+OD+ND.
ND can be measured using probing packets, e.g., in ICMP protocol, and is not controlled by cloud gaming systems. Thus, for a fair comparison between OnLive and GamingAnywhere, we compare only the processing delay on the server (PD) and the playout delay on the client (OD) of the two systems. Measuring PD and OD is much more challenging than measuring RD, because they occur internally in the cloud gaming systems, which may be closed and proprietary. The procedure detailed in [ Chen, Chang, Tseng, Huang, and LeiChen et al .2011] measures the PD and OD using external probes only, and thus works for OnLive even though we do not have access to their game servers in the cloud.
For GamingAnywhere, we further divide the PD and OD into subcomponents by instrumenting the server and client. More specifically, PD is divided into: (i) memory copy, which is the time for copying a raw image out of the games, (ii) format conversion, which is the time for color-space conversion, (iii) video encoding, which is the time for video compression, and (iv) packetization, which is the time for segmenting each frame into one or multiple packets. OD is divided into: (i) frame buffering, which is the time for receiving all the packets belonging to the current frame (ii) video decoding, which is the time for video decompression, and (iii) screen rendering, which is the time for displaying the decoded frame.
eps/without_smg/total_delay.png
Figure 8: Response delays of different systems.
eps/ga_delays.png
Figure 9: Delay decomposition of GamingAnywhere.
eps/without_smg/cpu_fear.png
Figure 10: Implication of CPU on responsiveness. Sample results from FEAR.
Results. Figure 8 reports the average PD (server) and OD (client) achieved by the considered cloud gaming systems. We make several observations. First, the OD is small, ≤ 31 ms in all cases. This reveals that all the decoders are efficient, and the decoding time of different games does not fluctuate too much. Second, GamingAnywhere achieves a much smaller PD, at most 34 ms, than OnLive, which is observed to be as high as 191 ms. This demonstrates the efficiency of the proposed GamingAnywhere: the PDs of OnLive are 3+ times longer than that of GamingAnywhere. Last, only GamingAnywhere achieves sub-100 ms RD.
Figure 9 presents the decomposed delay subcomponents of PD and OD. This figure reveals that the GamingAnywhere server and client are well-tuned, in the sense that all the steps in the pipeline are fairly efficient. Even for the most time-consuming video encoding (at the server) and video decoding (at the client) operations, each frame is finished in at most 16 and 7 ms on average. Such a short delay contributes to the lower RD of GamingAnywhere.
Figure 10 reports how different CPUs affect the PD of GamingAnywhere. We do not consider OnLive servers as they are managed by OnLive Inc. This figure shows that GamingAnywhere achieves a PD of ≤ 56 ms on Intel C2D and better CPUs. Moreover, GamingAnywhere suffers from higher processing delays on AMD Althon 64 CPUs, which may be attributed to the SSSE3-enabled x264 binary, as the SSSE3 instruction set is not available on some AMD Althon 64 CPUs.
eps/without_smg/network_load.png
Figure 11: Network loads incurred by the considered cloud gaming systems.
eps/idle_moving_uplink_bitrates.png
Figure 12: The average uplink traffic rate when DOW is being actively played and kept idle.
eps/uplink_pktsize_histogram_log.png
Figure 13: The uplink packet sizes generated by the OnLive client when DOW is being played and kept idle (without players' actions).

5.3  Network Loads

We recruit an experienced gamer, and ask him to play each game using different cloud gaming systems. Every game session lasts for 10 minutes, and the network packets are captured by Wireshark. For a fair comparison, the player is asked to follow two guidelines. First, he shall visit as many areas as possible and fight the opponents as in normal game plays. Second, he shall repeat his actions and trajectories as much as possible.
Results. Figure 11 plots the uplink and downlink traffic characteristics with 95% confidence intervals. Figures 11(a)-11(c) reveal that GamingAnywhere incurs a much lower uplink traffic loads compared to OnLive. Figures 11(d)-11(f) reveal that the downlink bit rates of GamingAnywhere are 2-3 Mbps, where those of OnLive are 3-5 Mbps. We notice that OnLive does not support user-configurable video encoding rates and parameters, which prevents them from being an experiment testbed like GamingAnywhere. Last, as we will see in Figures 16 and 17, even when we set the encoding bit rate of GamingAnywhere to be 3 Mbps, GamingAnywhere still outperforms OnLive in terms of video quality. The gap will be even larger if we increase GA's encoding bit rate.
We make another observation on Figure 11(d): Given that we set the encoding bit rate at 3 Mbps, the download bit rate should never be smaller than that. We took a closer look and found that, with GamingAnywhere, only Batman achieves 50 fps; FEAR and DOW only achieve 35-42 fps, which leads to lower download bit rates and may result in irregular playouts. Our in-depth analysis shows that, unlike Batman, both FEAR and DOW use Direct3D multisampling surfaces, which cannot be locked for memory copies. More specifically, an additional non-multisampling surface and an extra copy operation are required for FEAR and DOW, which in turn slightly affects the achieved frame rates.
Another observation that seems counter-intuitive in the first glance is Figure 11(a): The OnLive client sends out much more traffic to the server compared with GamingAnywhere. Since the uplink traffic should comprise only players' control actions, it seems unreasonable to have such a huge difference in uplink traffic between OnLive and GamingAnywhere. Our further analysis reveals that OnLive generates uplink traffic even when no gameplay actions are performed. Taking DOW as an example, we ask a gamer to play the game normally for 3 minutes and leave the game idle for another 3 minutes. We plot the average uplink traffic rate in Figure 12. The graph shows that OnLive generates around 40 kbps uplink traffic even when no gameplay actions are performed. Moreover, the differences between the idle and normal gameplay periods are 20 kbps and 30 kbps on GamingAnywhere and OnLive respectively, which indicate the rate of the traffic corresponding to players' actions. We further dissect the uplink packet sizes of OnLive. As shown in Figure 13, the OnLive client keeps sending packets with sizes 84, 164, 212, 116, and so on no matter whether a player is performing gameplay actions or not. More specifically, even when no gameplay actions are issued, the client sends out 40 packets per second on average with particular packet sizes. We believe that these packets are employed for path quality estimation and/or application-level acknowledgement purposes.
eps/without_smg/resolution_fear.png
Figure 14: Implication of resolutions on responsiveness, sample results from FEAR.
eps/new_ga_evaluation/fps/fps_batman.png (a) eps/new_ga_evaluation/fps/fps_fear.png (b)
Figure 15: Implication of frame capture rates on responsiveness: (a) Batman and (b) FEAR.

5.4  Impact of Screen Setting

We next vary the screen resolution and frame capture rate. We consider three resolutions: 1280x720, 1024x768, 640x480, and three frame capture rates: 50, 25, 10 fps. Since OnLive does not support different resolutions, we do not consider it.
Results. Figure 14 gives the impact of resolutions. This figure shows that GamingAnywhere scales well with screen resolutions, in terms of PD and OD, while Figure 15 presents the PD and OD, which shows that GamingAnywhere achieves stable PD and OD under different frame capture rates.
eps/without_smg/network_psnr.png
Figure 16: Achieved video quality in PSNR under different network conditions.
eps/without_smg/network_ssim.png
Figure 17: Achieved video quality in SSIM under different network conditions.

5.5  Streaming Quality under Different Network Conditions

The network conditions are the keys for high-quality video streaming, and we use dummynet to vary ND between 0-600 ms, packet loss rate between 0-10%, and bandwidth between 1-6 Mbps. We also include experiments with unlimited bandwidth. For OnLive, the ND in the Internet is already 130 ms and thus we cannot report the results of zero ND. Two video quality metrics, PSNR [ Wang, Ostermann, and ZhangWang et al .2001,p. 29] and Structural Similarity (SSIM) [ Wang, Lu, and BovikWang et al .2004], are adopted. We run GamingAnywhere on both UDP and TCP in this experiment.
Results. Figures 16 and 17 present the PSNR and SSIM values, respectively. We make four observations. First, ND does not affect the video quality too much (Figures 16(a) and 17(a)). Second, GamingAnywhere achieves much higher video quality than OnLive: up to 5 dB (PSNR) and 0.05 (SSIM) gaps are observed. Third, GamingAnywhere over UDP suffers from quality drops when packet loss rate is nontrivial, as illustrated in Figures 16(b) and 17(b). This can be attributed to the missing error resilience mechanism in GamingAnywhere. This issue can be coped with running GamingAnywhere over TCP, which leads to stable video quality even under nontrivial packet loss rate, as shown in Figures 16(b) and 17(b). Last, Figures 16(c) and 17(c) show that the video quality of GamingAnywhere suddenly drops when the bandwidth is smaller than the encoding bit rate of 3 Mbps. In summary, GamingAnywhere performs well under diverse network conditions, except: (i) when the available bandwidth is lower than the specified video encoding bit rate and (ii) when the packet loss rate is high while GamingAnywhere runs on UDP. These advantages of GamingAnywhere are achieved under a rather low video coding rate at 3 Mbps (see Figure 11(d)).

5.6  Performance Profiling

We also profile the performance of GamingAnywhere server to identify possible performance bottlenecks. We use oprofile [ LevonLevon] and lttng [ Desnoyers, Desfossez, and GouletDesnoyers et al .] to conduct the measurements on Ubuntu Linux with the game Cube 2: Sauerbraten [ van Oortmerssenvan Oortmerssen], which is a 3D fps game based on SDL and OpenGL. We use oprofile to retrieve performance counters at the function level without modifying and recompiling the source code; meanwhile, we use lttng to obtain the usage of kernel functions and CPU counters. To observe performance overheads of the GamingAnywhere server, we play the selected game with two different setups. In the first setup, we play the game locally without the involvement of any cloud gaming technologies. In the second setup, we launch the game with the GamingAnywhere event-driven server and play the game remotely. In each game session, we choose the single player private "Run N' Gun Part I" campaign, kill five monsters, and then quit the game. We play each game setup three times and the average game play length is approximately 1 minute.
Results. Table I summarizes the profiling results. When playing with GamingAnywhere, almost all the performance counters raise significantly. This is because many GamingAnywhere server operations, such as video frame capture (via libdricore.so), video encoding (in libx264.so), and color space conversion (in libga.so), require significant memory accesses and CPU time. Even so, the profiling results indicate that these operations are performed efficiently in that the branch miss rate and dTLB store miss rate with GamingAnywhere are even better (i.e., lower) than those without GamingAnywhere. The results also reveal that GamingAnywhere does not incur much overhead on the system kernel; actually, the number of system calls with GamingAnywhere is even less than that without GamingAnywhere. A closer look shows that the local game session invokes a large number of ioctl system calls, which should be used to handle inputs from control devices and outputs to the audio and graphics device. While a remote game session with GamingAnywhere still requires input handling, it does not need to output audio and graphics to local devices thus the ioctl system calls are largely eliminated. We also identify that GamingAnywhere makes a lot of futex system calls, which is used to synchronize mutexes among threads. This is because GamingAnywhere adopts a multi-threaded architecture to manage the game screen capture, encoding, and packetization pipeline, and some of the threads inevitably require accesses to shared buffers, thus mutexes are used to ensure the synchronization between threads whenever necessary.

5.7  Multiple GamingAnywhere Instances

So far we are running a single instance of GamingAnywhere on a game server. Since GamingAnywhere does not exclusively use any input/output device, it is possible to run multiple instances of GamingAnywhere on the same server so that the server can serve multiple players at the same time. Using the same testbed (Windows 7 desktops with Intel 2.67 GHz i7 processors), we run 1, 2, 3, and 4 instances of GamingAnywhere on the game Cube 2: Sauerbraten with each instance serving one client, and measure the average processing delays of the instance(s).
Results. We plot the results in Figure 18, which shows that the processing delays of the GamingAnywhere server increase linearly along with the increasing number of simultaneous instances. This indicates that GamingAnywhere scales well in terms of multiple instances and that GamingAnywhere is capable to be the cloud gaming solution hosting large-scale remote gaming services.
eps/ga_multi_instance.png
Figure 18: Processing delays of multiple GamingAnywhere instances running the game Cube 2: Sauerbraten.

6  Conclusion

We presented GamingAnywhere-the first open cloud gaming system, which is designed to be open, extensible, portable, and fully configurable. Via extensive experiments, we have shown that GamingAnywhere significantly outperforms a well-known, commercial, cloud gaming system: OnLive. Compared to GamingAnywhere, for example, OnLive suffers from up to 3 times higher processing delays, and 5 dB lower video quality. GamingAnywhere: (i) incurs lower network loads in both uplink and downlink directions, (ii) scales well with screen resolutions and frame capture rates, and (iii) adapt to diverse network conditions (except some boundary cases). We expect that cloud game developers, cloud service providers, system researchers, and individual users will use GamingAnywhere to set up complete cloud gaming testbeds for different purposes. In fact, a few weeks after making GamingAnywhere online at http://gaminganywhere.org in April 2013, we have received many inquiries. We firmly believe that the release of GamingAnywhere will stimulate more research innovations on cloud gaming systems, or multimedia streaming applications in general.
Future work. GamingAnywhere enables several future research directions on cloud gaming. For example, this article does not address the deployment problem of GamingAnywhere, in which cloud game hosting companies have to find the best tradeoff between the hardware and bandwidth investment and achieved gaming experience. Techniques for cloud management, such as resource allocation and Virtual Machine (VM) migration, are critical to the success of commercial deployments. These cloud management techniques need to be optimized for cloud games, e.g., the VM placement decisions need to be aware of gaming experience. Tiered cloud platforms may also be used in real deployment, e.g., relatively delay-tolerant games may be served by servers in further-away large data centers, while delay-sensitive games may be served by servers in edge clouds for high responsiveness.
GamingAnywhere attempts to transparently support cloud games, i.e., without any code customization, and thus works with almost all computer games. Another design alternative is to provide APIs for game developers to call, which may enable more optimization opportunities and reduce overhead. For example, if game developers adopt the APIs, GamingAnywhere won't need to hook into functions in the operating systems, which results in lower overhead. Another more aggressive, although complex, optimization strategy is to divide the game engine into multiple components and dynamically distribute these components on multiple cloud servers based on the demanded and available resources of the games and servers. While such optimization is out of the scope of this article, GamingAnywhere is an enabler of designing, implementing, and evaluating these comprehensive cloud gaming platforms.
eps/x264_6_fig.png
Figure 19: Results from x264. Different presets: (a) encoding frame rates and (b) achieved video quality. Different motion estimation algorithms: (c) encoding frame rates and (d) achieved video quality. Implications of sliced-level threading: (e) encoding frame rates and (f) achieved video quality.

7  Appendix: Real-Time Video Encoding Parameters

GamingAnywhere supports x264 [ x2642012] and vpxenc [ WebM2013] encoders for H.264/AVC and VP8 video encoding. These encoders are fairly comprehensive and provide many parameters to users for trading off the bit rate, video quality and encoding complexity. In this section, we conduct extensive experiments on an Intel i7 PC to find the best tradeoff settings for x264 and vpxenc. We use PSNR as the video quality metric, and fps as the computational complexity metric. Typically, higher PSNR (higher video quality) comes with lower fps (higher computational complexity). Our goal is to achieve  ∼  60 fps at 720p (1020x720) and the highest rate-distortion (R-D) performance. We record 10-min game plays in YUV format from three games: Batman, FEAR, and DOW, which are chosen from three different game genres (see Section 5.1). We then encode the raw videos with various parameters and report our observations.

7.1  x264 Encoding Parameters

Mandatory parameters for real-time encoding. Several parameters are required for real time x264 encoding. First, we need to disable the bi-directional (B) frames. We also need to disable the lookahead buffers, which are used for frame type (I, P, or B) decision and frame-level multi-threading. Disabling these two coding tools can be done by a convenience flag --tune zerolatency. Second, we need to enable slice-level multi-threading to leverage multi-core CPUs without incurring additional delay. Slices are essentially disjoint regions extracted from each video frame. Slice-level multi-threading cuts each frame into t slices, and allocates a thread to encode each slice. x264 supports: (i) --sliced-threads to enable slice-level multi-threading, (ii) --slices to specify the number of slices, and (iii) --threads to control the number of threads. Third, x264 supports intra refresh to control error propagation due to packet losses. With intra refresh, each frame consists of a column of intra-coded macroblocks, and this intra-coded column moves along the time. x264 allows intra refresh via the flag --intra-refresh.
Implications of other parameters. We exercise several other parameters and present their implications on video quality and computational complexity. --preset is used to select a predefined complexity level, ranging from ultrafast to very slow; --bitrate is used to set the target bit rate; --me is used to select the motion estimation algorithms, which can be diamond, hexagonal, multi-hexagonal, exhaustive, and Hadamard exhaustive search, from low to high complexity; and lastly, --merange specifies the motion vector search range, from 4 to 64 pixels. Moreover, we consider the number of threads t ∈ { 1, 2, 4, 6, 8}, and GoP size g ∈ {12, 24, 48, 96, 192, 384}. The GoP size is set by --keyint. If not otherwise specified, we employ --preset fast, --bitrate 1000, --me hex, --merange 16, t=4, and g=384.
We plot the results from different presets in Figures 19(a) and 19(b). We find that very fast preset leads to 100+ fps in all considered games. Moreover, moving from very fast to fast only results in up to 0.46 dB quality improvement, at the cost of at least 60 fps loss. Based on this observation, we recommend using very fast. We present the results from different motion estimation algorithms in Figures 19(c) and 19(d). Figure 19(d) shows that different algorithms lead to almost the same video quality. Figure 19(c) reveals that the exhaustive search algorithms may reduce the encoding frame rate by about 50%, for virtually no quality improvement. Hence, we recommend the diamond search algorithm (dia). We also find that increasing the search range (16 pixels by default) results in negligible impact on video quality and encoding complexity. We give the results from different number of threads in Figures 19(e) and 19(f). Figure 19(e) reveals that more slice-level threads result in higher encoding frame rates. Figure 19(f) shows that slice-level multi-threading leads to insignificant quality degradation. For example, increasing the number of threads from 1 to 4 results in a small video quality degradation of at most 0.19 dB, while the encoding frame rate is almost doubled. Hence, we recommend to use 4 threads. We plot the video quality with different GoP size in Figure 20, which shows that Batman and FEAR are less sensitive to the GoP size. DOW is more vulnerable to smaller GoP sizes, because the background of Real-Time Strategy (RTS) games does not move rapidly, and thus offers more inter-frame redundancy. We find that the GoP size does not affect the computational complexity of x264. The best GoP size depends on the network condition, and we chose a medium GoP size of 48.
eps/x264_test2c_psnr.png
Figure 20: Results from x264. Quality degradation due to smaller GoP size.
eps/test3_rate.png
Figure 21: Tradeoffs between rate and: (a) quality and (b) complexity. Results from x264.
Performance of x264. We report the complexity-rate-distortion relation under the recommended encoding parameters:
--profile main --preset faster --tune zerolatency --bitrate $r --ref 1 --me dia
--merange 16 --intra-refresh --keyint 48 --sliced-threads --slices 4 --threads 4
--input-res 1280x720,

where $r is the encoding rate. We vary $r between 250 and 3000 kbps and plot the rate-complexity and rate-quality curves in Figure 21. Figure 21(a) reveals that, for Batman, FEAR, and DOW, we achieve a good video quality of 35 dB at respective bit rates of only about 250, 800, and 1500 kbps. For an excellent video quality of 40 dB [ Wang, Ostermann, and ZhangWang et al .2001,p. 29], Batman and FEAR require bit rates of about 800 and 2000 kbps, while DOW demands slightly over 3000 kbps. These bit rates are widely available in modern access networks. Last, Figure 21(b) reports the encoding frame rates under different bit rates. This figure reveals that for a video quality of 35 dB, the encoding frame rates are 160+, 130+, and 140+, which are much higher than the rendering frame rates of most games.
eps/vpxenc_fig.png
Figure 22: Results from vpxenc. Different presets: (a) encoding frame rates and (b) achieved video quality. Different motion estimation algorithms: (c) encoding frame rates and (d) achieved video quality. Implications of sliced-level threading: (e) encoding frame rates and (f) achieved video quality.
eps/vpxenc_rate_fig.png
Figure 23: Tradeoffs between rate and: (a) quality and (b) complexity. Results from vpxenc.

7.2  vpxenc Encoding Parameters

Mandatory parameters for real-time encoding. We present the required parameters for real-time vpxenc encoding in the following. First, we need to enable one-pass encoding (instead of two-pass encoding) using --passes=1. Second, we set --end-usage=cbr to use CBR (constant bit rate) encoding, which reduces the rate fluctuations and thus is more suitable to real-time streaming. Third, real-time encoding dictates zero buffering, which is achieved by --buf-initial-sz=0, --buf-optimal-sz=0, --buf-sz=0. Fourth, we enable multi-threading by (i) --threads to specify the number of threads and (ii) --token-parts to specify the number of partitions, where each partition may be encoded by different entropy encoders (in different threads). We set the number of partitions to its maximal value 8, and vary the number of threads.
Implications of other parameters. We study how the parameters affect video quality and computational complexity. vpxenc exposes fewer encoding parameters, compared to x264. vpxenc supports 3 modes and 23 levels that lead to different tradeoffs of video quality and computational complexity. The three modes are: --best, --good, and --rt, where good and rt have 6 and 16 levels, respectively. The 16 levels of mode rt specify a target CPU usage (of vpxenc), from 0 to 100%. We consider 7 encoding modes/levels in total, best, good(0), good(1), ..., good(5), and rt(100%), from high to low complexity. --target-bitrate sets the target rate, and --kf-max-dist sets the maximal GoP size. If not otherwise specified, we employ --good, --cpu-used=4, --target-bitrate=100, --threads 4, --kf-max-dist 384.
We plot the results from different models/levels in Figures 22(a) and 22(b). We find that vpxenc only achieves ≤ 50 fps, except with good(5) ( --good and --cpu-used=5). Since the video quality drop from good(4) to good(5) is negligible, we recommend to use good(5). We present the results from different number of threads in Figures 22(c) and 22(d). Figure 22(d) shows that the video quality drops when there are more threads. This can be attributed to the design of multiple partitions and entropy coders-more partitions mean less redundancy. Figure 22(c) reveals that 6 threads are required for 60 fps and thus we recommend to use 6 threads. The impact of GoP size is shown in Figures 22(e) and 22(f). Although we observe a tradeoff between video quality and computational complexity, the deviation is moderate, and the best tradeoff depends on the network condition. We chose a medium GoP size of 48.
Performance of vpxenc. We report the complexity-rate-distortion relation under the recommended encoding parameters:
--i420 -w 1280 -h 720 -p 1 -t 6 --token-parts=3 --good --cpu-used=5 --end-usage=cbr
--target-bitrate=$r --fps=30000/1000 --buf-sz=0 --buf-initial-sz=0 --buf-optimal-sz=0
--kf-max-dist=48.

We vary $r between 250 and 3000 kbps, and plot the rate-complexity and rate-quality curves in Figure 23. Figure 23(a) shows that vpxenc achieves 35 dB at bit rates of about 250, 1000, and 3000 kbps for Batman, FEAR, and DOW, respectively. Figure 23(b) reveals that the corresponding encoding frame rates are 170+ in all considered games. Compared to x264 (Figure 21), we found that vpxenc achieves a slightly higher frame rates (up to 30 fps) at the expense of lower coding efficiency. Nonetheless, the resulting bit rates are available in most access networks nowadays.

References

[ Chang, Tseng, Chen, and LeiChang et al .2011] Chang, Y.-C., Tseng, P.-H., Chen, K.-T., and Lei, C.-L. 2011. Understanding The Performance of Thin-Client Gaming. In Proceedings of IEEE CQR 2011.
[ Chen, Chang, Tseng, Huang, and LeiChen et al .2011] Chen, K.-T., Chang, Y.-C., Tseng, P.-H., Huang, C.-Y., and Lei, C.-L. 2011. Measuring The Latency of Cloud Gaming Systems. In Proceedings of ACM Multimedia 2011.
[ Chen, Chang, and MaChen et al .2010] Chen, Y., Chang, C., and Ma, W. 2010. Asynchronous rendering. In Proc. of ACM SIGGRAPH symposium on Interactive 3D Graphics and Games (I3D'10). Washington, DC.
[ Choy, Wong, Simon, and RosenbergChoy et al .2012] Choy, S., Wong, B., Simon, G., and Rosenberg, C. 2012. The brewing storm in cloud gaming: A measurement study on cloud to end-user latency. In Proceedings of IEEE/ACM NetGames 2012.
[ Claypool and ClaypoolClaypool and Claypool2006] Claypool, M. and Claypool, K. 2006. Latency and player actions in online games. Communications of the ACM  49,  11, 40-45.
[ CostelloCostello2007] Costello, R. L. 2007. Building web services the rest way. xFront - Tutorial and Articles on XML and Web Technologies. http://www.xfront.com/REST-Web-Services.html">http://www.xfront.com/REST-Web-Services.html.
[ Desnoyers, Desfossez, and GouletDesnoyers et al .] Desnoyers, M., Desfossez, J., and Goulet, D. Linux trace toolkit - next generation. LTTng Project. https://lttng.org/">https://lttng.org/.
[ EA2012] EA 2012. Electronic Arts buys online gaming studio ESN, the developers behind battlefield's battlelog online social network. http://techcrunch.com/2012/09/26/electronic-arts">http://techcrunch.com/2012/09/26/electronic-arts.
[ Eisert and FechtelerEisert and Fechteler2008] Eisert, P. and Fechteler, P. 2008. Low delay streaming of computer graphics. In Proc. of IEEE International Conference on Image Processing (ICIP'08). San Diego, CA, 2704-2707.
[ FFmpeg projectFFmpeg project] FFmpeg project. ffmpeg. http://ffmpeg.org/">http://ffmpeg.org/.
[ FieldingFielding2000] Fielding, R. T. 2000. Architectural styles and the design of network-based software architectures. Ph.D. thesis, University of California, Irvine.
[ GaiKai2012] GaiKai 2012. Gaikai web page. http://www.gaikai.com/">http://www.gaikai.com/.
[ Giesen, Schnabel, and KleinGiesen et al .2008] Giesen, F., Schnabel, R., and Klein, R. 2008. Augmented compression for server-side rendering. In Proc. of International Fall Workshop on Vision, Modeling, and Visualization (VMV'08). Konstanz, Germany.
[ HendersonHenderson2003] Henderson, T. 2003. The effects of relative delay in networked games. Ph.D. thesis, Department of Computer Science, University of London.
[ Holthe, Mogstad, and RonningenHolthe et al .2009] Holthe, O., Mogstad, O., and Ronningen, L. 2009. Geelix LiveGames: Remote playing of video games. In Proc. of IEEE Consumer Communications and Networking Conference (CCNC'09). Las Vegas, NV.
[ Huang, Hsu, Chang, and ChenHuang et al .2013] Huang, C.-Y., Hsu, C.-H., Chang, Y.-C., and Chen, K.-T. 2013. GamingAnywhere: An Open Cloud Gaming System. In Proc. of ACM MMSys 2013.
[ Jurgelionis, Fechteler, Eisert, Bellotti, David, Laulajainen, Carmichael, Poulopoulos, Laikari, Perala, Gloria, and BourasJurgelionis et al .2009] Jurgelionis, A., Fechteler, P., Eisert, P., Bellotti, F., David, H., Laulajainen, J., Carmichael, R., Poulopoulos, V., Laikari, A., Perala, P., Gloria, A., and Bouras, C. 2009. Platform for distributed 3D gaming. International Journal of Computer Games Technology   2009, 1:1-1:15.
[ Lai and NiehLai and Nieh2006] Lai, A. and Nieh, J. 2006. On the performance of wide-area thin-client computing. ACM Transactions on Computer Systems  24,  2, 175-209.
[ LantingaLantinga] Lantinga, S. Simple DirectMedia Layer. http://www.libsdl.org/">http://www.libsdl.org/.
[ Lee, Chen, Su, and LeiLee et al .2012] Lee, Y.-T., Chen, K.-T., Su, H.-I., and Lei, C.-L. 2012. Are All Games Equally Cloud-Gaming-Friendly? An Electromyographic Approach. In Proceedings of IEEE/ACM NetGames 2012.
[ LevonLevon] Levon, J. OProfile - a system profiler for Linux. http://oprofile.sourceforge.net/">http://oprofile.sourceforge.net/.
[ Live NetworksLive Networks] Live Networks, I. LIVE555 streaming media. http://live555.com/liveMedia/">http://live555.com/liveMedia/.
[ LogMeIn2012] LogMeIn 2012. LogMeIn web page. https://secure.logmein.com/">https://secure.logmein.com/.
[ MicrosoftMicrosoft2012] Microsoft. 2012. Flipping surfaces (Direct3D 9). Windows Dev Center - Desktop. http://msdn.microsoft.com/en-us/library/windows/desktop/bb173393%28v=vs.85%29.aspx">http://msdn.microsoft.com/en-us/library/windows/desktop/bb173393%28v=vs.85%29.aspx.
[ Nieh, Yang, and NovikNieh et al .2003] Nieh, J., Yang, S., and Novik, N. 2003. Measuring thin-client performance using slow-motion benchmarking. ACM Transactions on Computer Systems  21,  1, 87-115.
[ OnLive2012] OnLive 2012. OnLive web page. http://www.onlive.com/">http://www.onlive.com/.
[ Packard and GettysPackard and Gettys2003] Packard, K. and Gettys, J. 2003. X window system network performance. In Proc. of USENIX Annual Technical Conference (ATC'03). San Antonio, TX, 206-218.
[ Schulzrinne, Casner, Frederick, and JacobsonSchulzrinne et al .2003] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, V. 2003. Rtp: A transport protocol for real-time applications. RFC 3550 (Standard). http://www.ietf.org/rfc/rfc3550.txt">http://www.ietf.org/rfc/rfc3550.txt.
[ Schulzrinne, Rao, and LanphierSchulzrinne et al .1998] Schulzrinne, H., Rao, A., and Lanphier, R. 1998. Real time streaming protocol (rtsp). RFC 2326 (Proposed Standard). http://www.ietf.org/rfc/rfc2326.txt">http://www.ietf.org/rfc/rfc2326.txt.
[ Shi, Hsu, Nahrstedt, and CampbellShi et al .2011] Shi, S., Hsu, C., Nahrstedt, K., and Campbell, R. 2011. Using graphics rendering contexts to enhance the real-time video coding for mobile cloud gaming. In Proc. of ACM Multimedia'11. Scottsdale, AZ, 103-112.
[ Sony2012] Sony 2012. Cloud gaming adoption is accelerating ... and fast! http://www.nttcom.tv/2012/07/09/cloud-gaming-adoption-is-acceleratingand-fast/">http://www.nttcom.tv/2012/07/09/cloud-gaming-adoption-is-acceleratingand-fast/.
[ StreamMyGame2012] StreamMyGame 2012. StreamMyGame web page. http://streammygame.com/">http://streammygame.com/.
[ TanenbaumTanenbaum2002] Tanenbaum, A. S. 2002. Computer Networks 4th Ed. Prentice Hall Professional Technical Reference.
[ TeamViewer2012] TeamViewer 2012. TeamViewer web page. http://www.teamviewer.com">http://www.teamviewer.com.
[ Tolia, Andersen, and SatyanarayananTolia et al .2006] Tolia, N., Andersen, D., and Satyanarayanan, M. 2006. Quantifying interactive user experience on thin clients. IEEE Computer  39,  3, 46-52.
[ UltraVNC2012] UltraVNC 2012. UltraVNC web page. http://www.uvnc.com/">http://www.uvnc.com/.
[ van Oortmerssenvan Oortmerssen] van Oortmerssen, W. Cube 2: Sauerbraten. http://sauerbraten.org/">http://sauerbraten.org/.
[ VideoLANVideoLAN] VideoLAN. VLC media player. Official page for VLC media player, the Open Source video framework! http://www.videolan.org/vlc/">http://www.videolan.org/vlc/.
[ Wang, Ostermann, and ZhangWang et al .2001] Wang, Y., Ostermann, J., and Zhang, Y. 2001. Video Processing and Communications. Prentice Hall.
[ Wang, Lu, and BovikWang et al .2004] Wang, Z., Lu, L., and Bovik, A. 2004. Video quality assessment based on structural distortion measurement. Signal Processing: Image Communication  19,  2, 121-132.
[ WebM2013] WebM 2013. The webm project web page. http://www.webmproject.org">http://www.webmproject.org.
[ Winter, Simoens, Deboosere, Turck, Moreau, Dhoedt, and DemeesterWinter et al .2006] Winter, D., Simoens, P., Deboosere, L., Turck, F., Moreau, J., Dhoedt, B., and Demeester, P. 2006. A hybrid thin-client protocol for multimedia streaming and interactive gaming applications. In Proc. of ACM NOSSDAV 2006. Newport, RI.
[ Wong and SeltzerWong and Seltzer1999] Wong, A. and Seltzer, M. 1999. Evaluating Windows NT terminal server performance. In Proc. of USENIX Windows NT Symposium (WINSYM'99). Seattle, WA, 145-154.
[ x2642012] x264 2012. x264 web page. http://www.videolan.org/developers/x264.html">http://www.videolan.org/developers/x264.html.
[ Zander, Leeder, and ArmitageZander et al .2005] Zander, S., Leeder, I., and Armitage, G. 2005. Achieving fairness in multiplayer network games through automated latency balancing. In Proc. of ACM SIGCHI ACE 2005. Valencia, Spain, 117-124.

Footnotes:

1. In addition to playing a game themselves, hobbyists may also like to watch how other gamers play the same game. An observer can only watch how a game is played but cannot be involved in the game.
2. In the current design, there can be one player and unlimited observers simultaneously in a game session.
3. The capturing of input events on clients will be elaborated in Section 4.2.3.


Sheng-Wei Chen (also known as Kuan-Ta Chen)
http://www.iis.sinica.edu.tw/~swc 
Last Update September 28, 2019