The binding between computing devices and displays is becoming dynamic and
adaptive, and screencast technologies enable such binding over wireless
networks. In this article, we design and conduct the first detailed measurement
study on the performance of the state-of-the-art screencast technologies.
Several commercial and one open-source screencast technologies are considered
in our detailed analysis, which leads to several insights: (i) there is no
single winning screencast technology, indicating room to further enhance the
screencast technologies, (ii) hardware video encoders significantly reduce the
CPU usage at the expense of slightly higher GPU usage and end-to-end delay, and
should be adopted in future screencast technologies, (iii) comprehensive error
resilience tools are needed as wireless communication is vulnerable to packet
loss, (iv) emerging video codecs designed for screen contents lead to better
Quality of Experience (QoE) of screencast, and (v) rate adaptation mechanisms
are critical to avoiding degraded QoE due to network dynamics. As a case study,
we propose a non-intrusive yet accurate available bandwidth estimation
mechanism. Real experiments demonstrate the practicality and efficiency of our
proposed solution. Our measurement methodology, open-source screencast
platform, and case study allow researchers and developers to quantitatively
evaluate other design considerations, which will lead to optimized screencast
technologies.
Author's address:
C.-F Hsu, T.-H Tsai, K.-T. Chen, 128 Academia Road, Section 2, Nankang, Taipei 11574; email:hsuchihfan@gmail.com, zark912@iis.sinica.edu.tw, swc@iis.sinica.edu.tw;
C.-F Fan, C.-H. Hsu, No. 101, Section 2, Kuang-Fu Road, Hsinchu, Taiwan 30013; email: yyytr7180@hotmail.com, chsu@cs.nthu.edu.tw;
C.-Y. Huang, 1001 University Road Hsinchu, Taiwan 30010; email: chuang@cs.nctu.edu.tw.
This work was supported in part by the Ministry of Science and Technology of Taiwan under the grants 103-2221-E-001-023-MY2, 102-2221-E-007-062-MY3, and 103-2221-E-019-033-MY2.
1 Introduction
Wide adoption of heterogeneous computing devices, such as PCs, tablets,
smart TVs, and smartphones, urges diverse ways for people to share
photos, watch videos, and play games with their family and friends.
Most people prefer to use larger or even multiple screens
to share contents instead of limiting to a single screen.
Ubiquitous displays are therefore gradually deployed in homes, schools,
offices, shops, and
even outdoor squares for experience sharing, educations, presentations,and
advertisements. According to market research reports,
the global flexible display market is expected to worth $3.89 billion by
2020, growing with high Compound Annual Growth Rate (CAGR) from 2014 to
2020 [ MarketsMarkets2014]. Moreover,
wireless networks have surged in popularity. Featuring
displaying screen contents without cable connections to computing devices,
wireless displays are expected to grow at a CAGR of 28.03% from
2012 to 2017 [ MarketsMarkets2012].
These reports show that the binding between computing devices and displays
becomes more dynamic, leading to flexible and diverse displaying experience.
Such dynamic binding of displays and computing devices can be done via
screencast, which refers to capturing and sending the audiovisual
streams from computing devices over networks to displays in real time.
Screencast enables many usage scenarios, including playing multimedia
contents over home networks, sharing desktops among colleagues over the
Internet, and extending the small built-in displays of mobile and
wearable devices over short-range wireless networks, such as Wi-Fi networks.
Screencast has attracted serious attention from both the
academia and industry because of its rich usage
scenarios. For example, several open-source
projects [ Huang, Chen, Chen, Hsu, and HsuHuang et al .2014, Chandra, Boreczky, and RoweChandra et al .2014] have been initiated to support screencast
among wearable and mobile devices as well as desktops, tablets, and
laptop computers. There are also proprietary and closed commercial
products, such as AirPlay [ AirPlayAirPlay2014], Chromecast [ Chromecast Web PageChromecast Web Page2014],
Miracast [ MiracastMiracast2014], MirrorOp [ MirrorOp Web PageMirrorOp Web Page2014], and
Splashtop [ SplashtopSplashtop2014]. Although screencast is gradually getting
deployed, the performance measurements on the state-of-the-art
screencast technologies have not been rigorously considered in the
literature. Current and future developers and researchers, therefore,
have to resort to heuristically making the design decisions when
building screencast technologies.
In this article, we first construct a real testbed to conduct the very
first set of
detailed experiments to quantify the performance of various screencast
technologies under diverse conditions. The conditions are captured by
several key parameters, including resolution, frame rate, bandwidth,
packet loss rate, and network delay. The performance metrics include
video bitrate, video quality, end-to-end latency, and frame loss rate.
We evaluate five commercial
products [ AirPlayAirPlay2014, Chromecast Web PageChromecast Web Page2014, MiracastMiracast2014, MirrorOp Web PageMirrorOp Web Page2014, SplashtopSplashtop2014] and an
open-source solution [ GamingAnywhere Web PageGamingAnywhere Web Page2013]. The commercial products
are treated as black boxes and general measurement methodologies are
developed to compare their performance in different aspects. The
open-source solution is a cloud gaming platform, called
GamingAnywhere (GA) [ Huang, Chen, Chen, Hsu, and HsuHuang et al .2014, GamingAnywhere Web PageGamingAnywhere Web Page2013]. GA works for
screencast, because cloud gaming is an extreme application of
screencast, which dictates high video quality, high frame rate (in
frame-per-second, fps), and low interaction latency [ Chen, Chang, Hsu, Chen, Huang, and HsuChen et al .2014].
Nevertheless, using GA as a general screencast technology leaves
some room for optimization, e.g., it is well-known that popular video
coding standards, such as H.264 [ Wiegand, Sullivan, Bjntegaard, and LuthraWiegand et al .2003], are designed for natural
videos and may not be suitable to screen contents, also known as
compound images, which are combinations of computer-generated
texts and graphics, rendered 3D scenes, and natural
videos [ Zhu, Ding, Xu, Shi, and YinZhu et al .2014].
Fortunately, GA [ Huang, Chen, Chen, Hsu, and HsuHuang et al .2014, GamingAnywhere Web PageGamingAnywhere Web Page2013] is extensible, portable,
configurable, and open. Therefore, developers and researchers are free
to use GA for systematic experiments to make design decisions for
optimized screencast. In this article, we design and conduct several such
experiments, e.g., we integrate GA with emerging video
codecs [ x264 Web Pagex264 Web Page2012, HEVC Test ModelHEVC Test Model2014] in order to conduct a user study using a real
screencast setup to quantify the gain of new video codecs. Our sample
experiments reveal the potential of using GA for screencast research
and developments. More importantly, we demonstrate how to measure the
performance of screencast technologies, and how to quantify the
pros/cons of different screencast technologies. The screencast measurement
setup and design are, therefore, useful on their own rights, because they have not been
reported in the literature. One common weakness of the state-of-the-art
open-source screencast technologies, GA [ Huang, Chen, Chen, Hsu, and HsuHuang et al .2014, GamingAnywhere Web PageGamingAnywhere Web Page2013], lack of bitrate adaptation feature,
which significantly degrades user experience. To address this limitation, we
develop and implement rate adaptation mechanism in GA, which uses video packets to estimate
the available bandwidth and adjusts the streaming rate accordingly. The
enhanced GA incurs no network estimation overhead and reacts to network
dynamic promptly. Evaluation results show that the proposed rate adaptation
mechanism is effective and efficient.
The preliminary version of the current article was published in Hsu et
al. [ Hsu, Tsai, Huang, Hsu, and ChenHsu et al .2015], which contains extensive measurement studies leading to
various insights on optimization room of screencast technologies. The main
findings are as follows.
Considering diverse usage conditions and performance metrics,
there is no single winning screencast technology, which indicates that
there is still room to optimize the state-of-the-art screencast
technologies in the coming years.
Hardware video encoders significantly reduce the CPU usage at the
screencast senders, and slightly increase the GPU usage and end-to-end latency;
hence are suitable to screencast technologies.
One way to better adapt to nonzero packet loss rate is to employ the
reliable TCP protocol, but TCP protocol does not work well when network latency
is long, which is inline with [ Calagari, Pakravan, Shirmohammadi, and HefeedaCalagari et al .2014]. Therefore, more comprehensive error
resilience tools are desired.
Screen contents are fairly different from
natural videos, and adopting emerging video codecs designed for screen
contents in screencast technologies leads to better Quality of
Experience (QoE).
In the current article, we make the following new contributions on optimizing
screencast technologies.
We design a new, non-intrusive available bandwidth estimator
for short-range Wi-Fi networks, which are the most popular networks
used in screencast scenarios.
We propose, implement, and evaluate a practical bitrate adaptation algorithm
based on the proposed available bandwidth estimation.
We conduct extensive experiments on the GA platform and the bitrate adaptation
algorithm to show the merits and practicality of the proposed solutions.
The article is organized as follows.
We review the
literature in Section 2. We customize GA to be a more
flexible platform for screencast in Section 3. This is
followed by the detailed measurement methodology given in
Section 4.
We present the GA-based quantitative
evaluations and user studies, and we discuss the design considerations
for future screencast technologies in Section 5.
Section 6 details a rate adaptation mechanism developed
by us. Section 7 concludes this paper.
In addition, due to the space limitation, we give the measurement results of
the state-of-the-art screencast technologies in Appendix B.
2 Related Work
In this section, we survey the literature in the following two directions: (i) screen sharing systems and
(ii) performance measurements of screencast platforms.
We summarize the major differences between related work and our current work
in Table I.
We also describe prior work on available bandwidth estimation and rate adaptation in
Appendix A (due to the space limitations).
We investigate the key factors for implementing a successful screencast
technology using GamingAnywhere (GA). GA may not be tailored for
screencast yet, e.g., unlike powerful cloud gaming servers, the
computing devices used for screencast may be resource-constrained
low-end PCs or mobile/wearable devices, and thus screencast senders
must be lightweight. Moreover, the screen contents of screencast are
quite diverse, compared to cloud gaming: text-based contents in
word processing, slide editing, and Web browsing applications are
common in screencast scenarios. In this section, we discuss the
customization of GA for screencast, which also enables researchers and
developers to employ GA in performance evaluations to systematically
make design decisions.
3.1 Support of More Codecs
GA adopts H.264 as its default codec. Currently the implementation is
based on libx264 and is accessed via the
ffmpeg/libav APIs.
However, we found that it is difficult to integrate other codec
implementations into GA following the current design. For example, if
we plan to use another H.264 implementation from Cisco [ OpenH264 Web PageOpenH264 Web Page2015],
we have to first implement it as an ffmpeg/libav module, whereas
integrating a new codec into ffmpeg/libav brings extra workload.
In addition, ffmpeg/libav's framework limits a user to access
advanced features of a codec. For example, libx264 allows a user to
dynamically reconfigure the codec in terms of, e.g., frame rates, but
currently it is not supported by ffmpeg/libav's framework.
Therefore, we revise the module design of GA to allow implementing a
codec without integrating the codec into the ffmpeg/libav
framework.
At the same time, we also migrate the RTSP server from ffmpeg to
live555.
As a result, GA now supports a wide range of video codecs that
provide the required Session Description Protocol (SDP) parameters at
the codec initialization phase. A summary of currently supported codecs
and the associated SDP parameters are shown in
Table II.
3.2 Hardware Encoder
Screencast servers may be CPU-constrained, and thus we integrate a
hardware encoder with GA as a reference implementation.
We choose a popular hardware platform, Intel's Media SDK
framework [ Intel Web PageIntel Web Page2015], to access the hardware encoder. The
hardware encoder is available on machines equipped with both an Intel
i-series CPU (2nd or later generations) and an Intel HD Graphics
video adapter.
To integrate the Intel hardware encoder into GA, we have to provide the
sprop-parameter-sets, which contains the SPS (Sequence
Parameter Set) and PPS (Picture Parameter Set) configurations of the
codec. After the codec is initialized, we can obtain the parameters
from the encoder context by retrieving SPS and PPS as codec parameters,
i.e., calling MFXVideoENCODE_GetVideoParam function with a
buffer of type MFX_EXTBUFF_CODING_OPTION_SPSPPS.
The Intel hardware encoder does not support many options. In addition
to the setup of bitrate, frame rate, and GoP size, we use the following
default configurations for the codec: main profile, best quality, VBR
rate control, no B-frame, single decoded frame buffering, and sliced
encoding. We also tried to enable intra-refresh feature, but
unfortunately this feature is not supported on all of our
Intel PCs.
We notice that Intel's video encoder only supports the NV12 pixel
format. Fortunately, it also provides a hardware-accelerated color
space converter. Thus, we can still take video sources with RGBA,
BGRA, and YUV420 formats; the video processing engine first
converts the input frames into the NV12 pixel format and then passes the
converted frames to the encoder. The CPU load reduction due to the
hardware encoder is significant, which we will show in the experiments
in Section 5.
3.3 Emerging Video Codecs
The revised GA design supports the emerging H.265 coding standard. To
be integrated with GA, an H.265 codec implementation has to provide
all the three required parameters (VPS, SPS, and PPS, as shown in Table II).
We have integrated libx265 [ x265 Web Pagex265 Web Page2014] and HEVC Test Model (HM)
[ HEVC Test ModelHEVC Test Model2014] with GA.
HEVC supports several emerging extensions like Range
Extension (REXT) and Screen Content Coding (SCC) [ Zhu, Ding, Xu, Shi, and YinZhu et al .2014], which
are designed for screencast or similar applications.
We note that neither libx265 nor HM are optimized for real-time
applications in our experiments. Longer encoding time, however, is not a
huge concern for now, as both implementations are emerging and we
consider that the implementations will be optimized before actual
deployments. Therefore, in Section 5, we evaluate these
emerging codecs, and we focus on their achieved user experience (e.g.,
graphics quality) by encoding screen contents without considering their
running time.
4 Measurement Methodology
In this section, we present the measurement methodology to systematically
compare the state-of-the-art screencast technologies.
4.1 Screencast Technologies
The following five commercial screencast technologies are considered in our experiments.
Chromecast is a digital media player which is capable of
directly streaming audiovisual contents via Wi-Fi. For screencast, a
user can use Google Cast extension for Chrome, which uses WebRTC API to
transmit screen contents from the Web browser or desktop to the
Chromecast device.
Miracast is a peer-to-peer wireless standard for screencast
over Wi-Fi Direct. Miracast-compatible devices can serve as Miracast
senders and receivers. Existing OS's with built-in Miracast support
include Android 4.2 or later, BlackBerry 10.2, and Microsoft Windows
8.1. For streaming screens to a device that does not support Miracast,
there are also Miracast adapters capable of rendering the screens
through HDMI or USB ports.
MirrorOp and Splashtop offer pure software solutions,
which require the users to install proprietary applications at both the
sender and receiver. Although MirrorOp and Splashtop use closed
protocols, the developers offer the applications on multiple OS's,
including Windows and Mac OS X.
In addition, the open-source GA is
evaluated as a screencast technology as well.
4.2 Content Types
We study how the screencast technologies perform when streaming
different types of contents. We consider 9 content types in the
following 3 categories:
Gaming: including first-person shooter, racing, and
turn-based strategy games.
Movie/TV: including dialogue movie scene, car chasing movie
scene, and talk show.
Applications: including Google street view
browsing, slide editing, and Web surfing in Chrome.
For fair comparisons, we record the screens of different content types
into 1280x720 videos. In particular, we extract one minute of representative
video for each content type and concatenate them into a single 9-minute
long video. We insert 2-second white video frames between any two
adjacent content types to reset the video codecs. In this way, the
measurement results collected from adjacent content types do not
interfere one another.
4.3 Workload and Network Conditions
‡ If not otherwise specified, the PC computer is a ThinkCentre M92p, and the laptop computer is a ThinkPad X240.
We also study how the screencast performance is affected under
different workload settings and network conditions, which we believe
impose direct and non-trivial impacts on screencast quality. Workload
parameters are related to the quality of source videos, including
frame rate and resolution. We change the frame sampling rates to
generate multiple videos, and set 30 fps as the default frame rate. We
also vary the resolutions at 1280x720, 896x504, and 640x480. For the
latter two cases, we place the video at the center of the (larger)
screen without resizing it. This is because we believe image resizing
would cause loss of details and therefore bias our results. As to network
conditions, we use dummynet1 to control the bandwidth, delay, and
packet loss rate (packet loss) of the outgoing channel of
senders. The default bandwidth is not throttled, the delay is 0 ms, and
the packet loss rate is 0%.
In our experiments, a parameter of workload and network conditions is
varied while all other parameters are fixed at their
default values. The list of parameters is given in
Table III, with the respective default values in
boldface. For screencast technologies that support both UDP and TCP
protocols, the default protocol is UDP.
4.4 Experiment Setup
There are several components in the experiment: a sender and a receiver
for each screencast technology, and a Wi-Fi AP, which is mandatory for
all technologies except Miracast (based on Wi-Fi Direct). The
specifications of the screencast technologies are summarized in
Table IV, and the detailed experiment setups are given
below.
AirPlay. The sender is a MacBook Pro running OS X 10.9.2, with a 2.4 GHz Intel Core i5 processor and 8 GB memory, while the receiver is an Apple TV. They are connected to the same Wi-Fi AP before the sender can discover, connect, and stream screens to the receiver.
Chromecast. The sender is a Lenovo ThinkPad X240
notebook running Windows 8.1, with an 2.6 GHz Intel Core i5 processor and
an 8 GB memory with a receiver that is a Chromecast dongle. The only way for
screencasting using Chromecast is by Google Cast Chrome Extension. Once
the sender is connected to the Wi-Fi AP, it can discover and connect to
any available devices in the same Wi-Fi network.
Miracast. We use the Lenovo notebook as the sender. For
the receiver, we use a NETGEAR Push2TV Miracast adapter. Miracast is
based on Wi-Fi Direct and supported by Windows 8.1. As long as the
receiver is placed within the wireless transmission range of the
sender, Windows 8.1 provides a simple user interface for screencasting
the sender's desktop to the receiver.
MirrorOp and Splashtop. The Lenovo notebook
serves as the sender, while a PC running Windows 7, with an Intel Core
i7 processor serves as the receiver. To use these two services, a user
needs to create an account, and run the sender and receiver programs on
the respective machines. Once both machines are logged in, they can
discover and connect to each other.
In addition, experiments on GA are also conducted using the same setup
as MirrorOp and Splashtop. We note that there may be multiple
implementations for certain technologies, e.g., Miracast, but we cannot
cover all the implementations in this work. We pick a popular
implementation for each technology, and detail the measurement
methodology so that interested readers can apply the methodology to
other implementations.
4.5 Performance Metrics
We measure the following performance metrics that are crucial to
screencast user experience.
Bitrate. The average amount of data per second transmitted from
the sender to receiver, which is important because the wireless spectrum and total bandwidth is
limited and shared by all applications/users.
End-to-end latency (latency). The time difference between
each video frame is rendered at the sender and at the receiver, which
is especially important for interactive applications. The user
experience also drops if the latency jitter (i.e., the variation of latency) is high.
Frame loss rate (frame loss). The fraction of video frames
that are not rendered at the receiver, which greatly affects the
viewing experience.
When presenting the measurement results, 95% confidence intervals of
the averages are given as error bars in the figures whenever
applicable.
(a)
(b)
(c)
Figure 1: Experiment setup for: (a) bitrate/video quality and (b) latency; (c) actual testbed for latency measurements in our lab.
4.6 Experiment Procedure
For each technology, we first connect the sender and receiver, play the
video with diverse content types at the sender, and measure the four
performance metrics. We repeat the experiment ten times with each
configuration (i.e., workload and network parameters). To facilitate
our measurements, we have added a unique color bar at the top of each
frame of the source contents as their frame id, which can be
programmatically recognized (c.f., Figure 1(c)).
To measure the bitrate used by the screencast technologies, we run a
packet analyzer at the sender to keep track of the outgoing packets
during the experiments. For measuring the video quality, we direct the
HDMI output of the receiver to a PC, which is referred to as the
recorder. The recorder PC is equipped with an Avermedia video capture
card to record the videos. To quantify the quality degradation, each
frame of the recorded video is matched to its counterpart in the source
video, using the frame id. Last, we calculate the PSNR and SSIM values
as well as the frame loss rate by matching the frames. This setup is
illustrated in Figure 1(a).
To measure the user-perceived latency, we direct the rendered videos of
both the sender and receiver to two side-by-side monitors via HDMI (for
the sake of larger displays). We then set up a Canon EOS 600D camera to
record the two monitors at the same time, as shown in
Figure 1(c). To capture every frame rendered on the
monitors, we set the recording frame rate of the camera to 60 fps,
which equals to the highest frame rate in our workload settings. The
recorded video is then processed to compute the latency of each frame,
by matching the frames based on frame ids and by comparing the timestamps
when the frame is rendered by the sender and receiver. The
setup is shown in Figure 1(b).
Last, we note that we had to repeat each experiment twice: once for
bitrate and video quality (Figure 1(a)), and
once for the latency (Figure 1(b)). This is
because each receiver only has a single HDMI output, but the two
measurement setups are quite different. Fortunately, our experiments
are highly automated in a controlled environment, thus our
experiment results are not biased. The actual testbed is shown in
Figure 1(c).
5 Design Considerations
Our performance evaluations on screencast technologies given in
Appendix B lead to two main observations: (i) screencast
technologies all have advantages and disadvantages and (ii) deeper
investigations to identify the best design decisions are crucial. In
this section, we present a series of GA-based experiments to analyze
several design considerations. We emphasize that our list of
design considerations is not exhausted, and readers are free to
leverage open-source screencast technologies such as GA [ Huang, Chen, Chen, Hsu, and HsuHuang et al .2014]
and DisplayCast [ Chandra, Biehl, Boreczky, Carter, and RoweChandra et al .2012, Chandra, Boreczky, and RoweChandra et al .2014] for similar studies.
5.1 Software vs. Hardware Encoding
We study the implications of switching from software video encoder to
hardware encoder in GA, and we compare their performance against the
commercial screencast technologies. We use the experiment setup
presented in Table IV, and we stream the 9 minutes 18
seconds video using the default settings given in
Table III. We consider three performance metrics: CPU
usage, GPU usage, and end-to-end latency. For CPU/GPU usage, we take a
sample every second, and the end-to-end latency is calculated for every
frame. Then, we report the average CPU/GPU usages incurred by
individual screencast technologies in Figure 2. In
this figure, GA and GA (HE) represent GA with software and hardware
video encoders, respectively. Moreover, the numbers above the points are
the average end-to-end latency.
We draw several observations from this figure. First, hardware encoder
dramatically reduces the CPU usage of GA: less than 1/3 of CPU usage is
resulted compared to software encoder. Second, upon using the hardware
encoder, GA results in lower CPU usage, compared to MirrorOp,
Chromecast, and Splashtop. While AirPlay and Miracast consume less CPU
compared to GA with hardware encoder, they achieve inferior coding
efficiency as illustrated in Figures 13(a) and
13(d). More specifically, although AirPlay and Miracast
incur much higher bitrate, their achieved video quality levels are no
better than other screencast technologies. We conclude that AirPlay and
Miracast trade bandwidth usage (coding efficiency) for lower CPU load,
so as to support less powerful mobile devices, including iOS and
BlackBerry. Third, both GA and GA (HE) achieve very low latency: up to
18 times lower than some screencast technologies. Such low end-to-end
latency comes from one of the design decisions of GA, i.e., zero
playout buffering [ Huang, Chen, Chen, Hsu, and HsuHuang et al .2014], as a cloud gaming platform, which is
crucial for highly interactive applications during screencasting. We
note that GA (HE) leads to 26 ms longer latency than GA, which is due
to the less flexible frame buffer management mechanism in Intel's Media
SDK framework [ Intel Web PageIntel Web Page2015], which prevents us from performing more
detailed latency optimization that are done by us in ffmpeg/libav.
In summary, the hardware video encoder largely reduces the CPU
usage, while slightly increases the GPU usage and end-to-end latency. It is therefore
quite worthy to consider when developers are building future screencast technologies.
Figure 2: Hardware encoder reduces the CPU usage of GA. The numbers below
screencast technologies represent end-to-end latency, and the length of
horizontal (and vertical) line segments represent the 95% confidence
intervals of mean CPU usage (and GPU usage), respectively.
Figure 3: The impacts of TCP and UDP protocols. The numbers below symbols
represent graphical quality in PSNR. Each pair of experiments with
identical settings (except for the transport protocol) is connected by
dashed lines, and the length of horizontal (and vertical) line segments
represent the 95% confidence intervals of mean latency (and frame loss
rate), respectively.
5.2 Comparison of Transport Protocols
The experiment results given in Appendix B indicate that
GA is vulnerable to nontrivial packet loss rate. This may be attributed
to the fact that GA employs the UDP protocol by default, and a quick
fix may be switching to the reliable TCP protocol. Therefore, we next
conduct the experiments using GA with the UDP and TCP protocols. We
adopt the default settings as above and vary the network bandwidth and
delay settings. We consider 3 performance metrics: end-to-end latency,
frame loss rate, and video quality in PSNR and report the average
results over the 9 minutes 18 seconds video in
Figure 3, where two corresponding points (those of
UDP versus TCP) are connected by dashed lines. The annotations next to the
dash lines are network conditions, and the numbers next to the points are the PSNR
values representing the resulting video quality rendered at the client.
We make the following observations. When the network delay is low, TCP
always leads to lower frame loss rate: 2% difference is observed. However,
when the delay is longer, say ≥ 100 ms, TCP results in even higher frame
loss rate, which can be attributed to the longer delay caused by TCP, making
more packets miss their playout deadlines and are essentially useless. Moreover,
TCP usually incurs slightly longer end-to-end latency, except when we set the
bandwidth to 4 Mbps, which leads to a much longer latency. On the other hand,
under 4 Mbps bandwidth, UDP suffers from higher packet loss rates and thus leads to
lower video quality, in particular, UDP results in 2.5 dB lower video quality than TCP.
(a)
(b)
Figure 4: Video quality achieved by different screencast technologies on diverse
content types, in: (a) PSNR and (b) SSIM.
In summary, Figure 3 depicts that the TCP protocol
may be used as a basic error resilience tool of GA, but it does not
perform well when network delay is longer and when the network
bandwidth is not always sufficiently provisioned. This is inline with
the well-known limitation on TCP: it suffers from degraded performance
in fat long pipes [ KleinrockKleinrock1992], due to the widely adopted
congestion control algorithms. Hence, more advanced error resilience
tools are desired.
5.3 Comparison of Video Codecs
Under the default settings, we report the achieved video quality in
Figure 4. This figure shows that MirrorOp
and Splashtop achieve good video quality for all content types, while
other screencast technologies all suffer from degraded video quality
for some content types. For example, AirPlay leads to inferior PSNR for
Applications, and GA results in lower PSNR/SSIM for Movie/TV.
Furthermore, we observe that several screencast technologies suffer
from lower video quality, especially in PSNR, for some content types. For example, for Web
browsing, AirPlay, Chromecast, and Miracast lead to ∼ 22 dB in
PSNR, which can be caused by the different characteristics of Web
browsing videos: the sharp edges of texts are easily affected by the
ringing artifacts in the standard video codecs, such as
H.264 [ Wiegand, Sullivan, Bjntegaard, and LuthraWiegand et al .2003]. Recently, Screen Content Coding (SCC) has been
proposed [ Zhu, Ding, Xu, Shi, and YinZhu et al .2014] as an extension to the High Efficiency Video
Coding (HEVC) standard. SCC is built on top of the Range Extension
(REXT) of the HEVC standard. REXT expands the supported image bit
depths and color sampling formats for high-quality video coding.
In the following, we conduct a separate study to investigate the
benefits of the emerging video coding standards: H.265 REXT, which
is designed for nature videos, and H.265 SCC, which is designed
for screen contents. For comparisons, we also include x264 with two
sets of coding parameters: the real-time parameters used by GA, which
is denoted as H.264 RT, and the high-quality parameters with most
optimization tools enabled, which is denoted as H.264 SLOW. In
particular, we select 5 screen content videos: BasketballScreen
(2560x1440), Console (1920x1080), Desktop (1920x1080), MissionControl3
(1920x1080), and Programming (1280x720) from HEVC testing sequences for SCC.
We encode the first 300 frames of each video using the four codecs at
512 kbps on an AMD 2.6 GHz CPU. Table V gives the
resulting video quality, which reveals that, H.264 RT results in
inferior video quality. With optimized tools enabled, H.264 SLOW leads
to video quality comparable to H.265 REXT, which is outperformed by
H.265 SCC by up to ∼ 5 dB. This table shows the potential of the
emerging H.265 video codecs.
(a)
(b)
Figure 5: QoE scores of achieved by different codecs: (a) overall scores and (b) individual videos.
We next conduct a user study to get the QoE scores achieved by
different codecs. We randomly pick 40 frames from each video, and
extract these frames from the reconstructed videos of the 4 codecs. We
save the chosen frames as lossless PNG images, and create a Web site to
collect inputs from general publics. We present images encoded by two
random codecs side-by-side, and ask viewers to do pair comparisons. We
conducted the user study in September 2014, including 126 paid
subjects, who completed 180 sessions with 7,200 paired comparisons, and
the total time the subjects spent in the study is 27.2 hours. We compute
the QoE scores using the Bradley-Terry-Luce (BTL)
model [ Wu, Chen, Chang, and LeiWu et al .2013] and normalize the scores to the range
between 0 (worst experience) and 1 (best experience). We plot the
overall average and per-video QoE scores in Figure 5.
We make a number of observations on this figure. First, H.265 SCC
outperforms H.265 REXT for all videos, demonstrating the effectiveness
of H.265 SCC. Second, the H.264 RT codec results in very low QoE
scores, while the H.264 SLOW codec results in video quality comparable
to H.265 SCC. However, a closer look at the H.264 SLOW reveals that the
encoding speed can be as low as < 1 fps, turning it less suitable to
real-time applications such as screencasting.
In summary, Figures 4 and
5 depict that different contents require different
video codecs, e.g., the emerging H.265 SCC codec is more suitable to
screen contents, comprising texts, graphics, and nature images.
Figure 6: Sample blocking features observed in Miracast.
5.4 Necessity of Rate Adaptation
We repeat the experiments under different bandwidth settings: between 4 Mbps and 1 Mbps.
We observe various negative impacts, including slow responsiveness,
blocking features, frozen screens,
consecutive lost frames, and disconnections (between the sender and
receiver) for some screencast technologies once the bandwidth is lower
than 3 Mbps. Figure 6 gives a sample screen with serious artifacts.
Table VI presents the observed
negative impacts under 1 Mbps bandwidth, which clearly shows that most
screencast technologies suffer from at least two types of negative
implications. AirPlay performs the best, which is consistent with our
observation made in
Appendix B.2
: AirPlay actively
adapts its bitrate to the changing bandwidth. On the other hand,
although Chromecast and Miracast also actively adapt their bitrate,
they do not survive under low bandwidth. Furthermore, (ordinary) GA, MirrorOp, and
Splashtop do not adapt their bitrate to the bandwidth at all, and thus
sometimes they under-utilize the available bandwidth. Moreover,
they may sometimes send excessive traffic and suffer from unnecessary
packet loss and quality degradation. These observations clearly
manifest that more carefully-designed rate adaptation mechanism is
highly demanded for screencast technologies.
Thus, we develop and implement rate adaptation mechanism in open-source
GA and conduct experiments to evaluate the performance and efficiency of
our proposed mechanism. The details are given in the next section.
6 An Adaptive Screencast Platform
The crux of adaptive screencasting is an accurate and efficient available bandwidth
estimation algorithm.
However, most existing algorithms are not suitable for screencast platforms
because they send probing packets that generate extra traffic. This is not a
problem for quickly estimating the available bandwidth before streaming
commences, but less suitable to continuous bandwidth estimation.
To better understand the overhead, we configure
Iperf [ Iperf Web PageIperf Web Page2015] and WBest+ [ Farshad, Lee, Marina, and GarciaFarshad et al .2014] to estimate
the available bandwidth for 2 minutes.
We then plot the probing traffic overhead in Figure 7.
Iperf (OT) and WBest+ (OT) only send probing packets one time at the beginning of the streaming,
and thus are not able to dynamically estimate the available bandwidth for rate adaptation.
On the other hand, we configure Iperf (P) and WBest+ (P) to send probing packets
every 20 seconds for adaptive estimation. This figure shows that Iperf results in much higher
probing overhead, especially for dynamic estimation, which prevents screencast platforms
from sending videos at higher bitrates.
WBest+ introduces about 3.4 Mbits probing overhead for each estimation.
In contrast, we develop a non-intrusive and flexible algorithm, inspired by WBest+, that leverages video packets
as probing packets. Indicated as GA in this figure, our algorithm (detailed below) leads to
no probing overhead.
Figure 7: Overhead of Iperf and WBest over non-intrusive GA.
Figure 8: The overview of the components for rate adaptation.
6.1 Overview on Adaptation Supports
Figure 8 gives a high-level overview of our proposed rate
adaptation mechanism for GA [ Huang, Chen, Chen, Hsu, and HsuHuang et al .2014] screencast platform.
A typical screencast platform consists of several components
that are presented below. The server contains: (i) a frame buffer that is the display
memory holding the screen contents, (ii) a video capturer that repeatedly retrieves the
screen contents from the frame buffer, (iii) a video encoder that encodes the captured
video, and (iv) an RTSP/RTP sender that sends the coded video to the client via
wired or wireless networks. The client contains: (i) an RTSP/RTP receiver that receives
the coded video from the networks, (ii) a video decoder that decodes the coded video to
screen contents, and (iii) a display to render the screen contents to user. These 7 components
are standard to real-time video streaming applications, including screencast platforms.
To support the rate adaptation mechanism, we add 3 new software
components in the screencast server/client. They are highlighted by the bold font in
Figure 8. These components are:
(i) bandwidth estimator, (ii) threshold chooser, and (iii)
bitrate reconfigurator.
Their interactions are as follows.
When the screencast server starts, the screen contents are captured by the
video capturer and sent to the video encoder, which is then streamed to the screencast client by
the RTSP/RTP sender. Upon receiving video frames from the server for a period
of time T, the RTSP/RTP receiver at the client calculates the current packet loss rate
ρ and throughput θ and then reports them to the threshold chooser.
The threshold chooser selects the aggregation threshold a, which is the percentile
of inter-arrival times of received packets in T. Adjacent packets with inter-arrival time
smaller than the aggregation threshold are considered as parts of the same aggregated frame
in 802.11n.
The bandwidth estimator uses the timestamps of video packets sent from the
RTSP/RTP receiver and the empirical chosen percentile a sent from the threshold chooser
to estimate the available bandwidth [ˉc]a by calculating the results of the
packet size over the dispersion time among clustered adjacent probing packets.
After that, the bandwidth
estimator sends the estimated bandwidth to the server for the bitrate reconfigurator
that maps the estimated bandwidth to the target encoding bitrate b. The target encoding
bitrate is then sent to the video encoder.
We present the detailed designs of the added components in the next section.
6.2 Design of Major Adaptation Components
We build our bandwidth estimator on top of the state-of-the-art
WBest+ [ Farshad, Lee, Marina, and GarciaFarshad et al .2014], which
is proposed to estimate available bandwidth for 802.11n.
However, they did not specify the actual value of the
aggregation threshold, and we address this limitation. In particular, we first create a
Cumulative Distribution Function (CDF)
of inter-arrival times, and then determine the percentile a to best match the average
bandwidth to the ground truth given by Iperf.
We conduct experiments to find aggregation threshold a in our labs with live interference
traffic.
We consider the following two scenarios:
(i) both the server and the client are launched
on End Devices (EDs), which are connected via a Wi-Fi Access Point (AP), and (ii) the server and the
client are launched on an ED and an AP, respectively. We vary the
distances between the server and the client to be at 1, 2, 4, and 8 m. In our experiments,
the server streams captured video to the client for 3 minutes at each distance, and we compute the
aggregation thresholds in Table VII.
This table shows that the aggregation thresholds are diverse and dynamic, e.g.,
the difference can be higher than 10%.
Therefore, the threshold chooser that dynamically adjusts the aggregation threshold is
proposed to enhance our bandwidth estimator.
The bitrate reconfigurator then reconfigures the encoding bitrate, which is proportional to
the estimated bandwidth.
We also conduct experiments to determine the best system parameters: ρf and
bf for adjusting the aggregation threshold and
the encoding bitrate, respectively. We vary ρf, bf ∈ {0.5, 0.6, 0.7, 0.8, 0.9}
and measure the throughput and the packet
loss rate. We found that the difference between the highest and the lowest throughput and the packet
loss rate under different bf values are only 1.9% and 1.5%, respectively. Therefore, we choose the
median, 0.7, as our default bf value. Similarly, the difference between the highest throughput and
the lowest throughput under different ρf values is only 2.7%. Thus, we choose 0.7
as the default ρf value for its minimum packet loss rate of 3.11% among all considered ρf values.
Threshold chooser.
The threshold chooser considers the packet loss rate and throughput to
choose the aggregation threshold, e.g., high packet loss rate indicates that
the aggregation threshold
should be risen for lower available bandwidth estimation. In contrast,
if the throughput is high and there is no packet loss, the aggregation threshold should
be reduced.
In our implementation, the aggregation threshold a is adjusted with step size δ
once every T seconds as follows.
We increase the aggregation threshold if the current packet loss rate
is higher than the average packet loss rate ρavg multiply a system parameter
ρf. If not otherwise specified, we let ρf=0.7 according to our preliminary experiment.
We note that the actual bitrate of coded video is typically different from the target
encoding bitrate, depending on the amount of information to code and inaccuracy of the rate control.
Therefore, we introduce a new parameter θf to accommodate such
reality in the following way. We reduce the
aggregation threshold if: (i) the current packet loss rate is lower than the minimum packet loss
rate ρL and (ii) the achieved throughput is higher than the factor of target bitrate
θf ×b.
We set ρL = 0%, θf = 95% if not otherwise specified.
Bandwidth estimator.
Let sp and tp represent the packet size and the received time of packet p.
The packet dispersion time, which is the inter-arrival time between two adjacent received
packets, can be represented as tp−tp−1 and the instantaneous bandwidth is calculated
as sp/(tp−tp−1). We first sort all the packet dispersion times collected in T
seconds in the ascending order, and use a to cluster the received packets for
estimating the available bandwidth [ˉc]a.
Bitrate reconfigurator.
The bitrate reconfigurator obtains the available bandwidth [ˉc]a from the bandwidth
estimator and then reconfigures the encoder with encoding bitrate b, which is a function
of [ˉc]a.
We let b = bf[ˉc]a to be conservative according to our preliminary experiment
if not otherwise specified.
Pseudocode of our algorithm.
Algorithm 1 gives the pseudocode of our rate adaptation algorithm.
Lines 3-7 belong to the threshold chooser. Line 3
checks whether the achieved throughput is high and the packet loss rate is low. If it passes,
line 4 reduces the aggregation threshold for higher estimated available bandwidth. Line 5 checks
if the current packet loss rate is high.
If it passes, line 6 increases the aggregation threshold.
Line 8 is part of the bandwidth estimator, which computes the available bandwidth as the average value of
all sp/(tp−tp−1), where (tp−tp−1) > a.
The server then reconfigures the encoding bitrate in line 9.
Instead of dynamically selecting the aggregation threshold for bandwidth estimation and encoding
bitrate, Hong et al. [ Hong, Hsu, Tsai, Huang, Chen, and HsuHong et al .2015] choose the mean capacity among all packets
as the effective capacity, while
Huang et al. [ Huang, Hsu, Tsai, Fan, Hsu, and ChenHuang et al .2015] model the percentile capacity for single-hop bandwidth estimation. Compared to the current
article, they are less flexible in diverse and dynamic wireless networks.
Javadtalab et al. [ Javadtalab, Semsarzadeh, Khanchi, Shirmohammadi, and YassineJavadtalab et al .2015] estimate the available bandwidth changes during video streaming
using weighted inter-arrival time of video packets. Their method is not tailored for 802.11n
or other short-range wireless networks, which is the most common environment for screencast.
[tb]
[1]
every T seconds
Compute ρ and θ
θ > θf ×b and ρ ≤ ρL
a = a − δ
ρ > ρavg ×ρf
a = a + δ
Compute [ˉc]a
b = bf ×[ˉc]a
Reconfigure video encoder with b
#1Rate Adaptation Algorithm
6.3 Experiments
We have implemented our proposed components for rate adaptations in GA server/client.
The AP is build on a Linux box using a wireless adapter with Atheros chip.
Our proposed system is meant to be used in real world. Thus, we conduct
experiments in the lab, which is one of the typical scenarios for screencast.
Our experiments last within 2 to 3 hours, and we believe that the wireless network conditions do not fluctuate too much
within such a short time period.
In our experiments, both the server and client run on Thinkpad Linux laptops.
We randomly choose a content type from each of the three categories
(see Section 4.2) and play the videos on the server.
We save the videos at both server and client into YUV files, which
are used for quality assessment. We run Wireshark at the server for network
related results, such as bitrate, video quality, frame loss rate,
estimated bandwidth, throughput, and packet loss rate. The
estimated bandwidth is the available bandwidth reported by the bandwidth
estimator. The throughput is the number of bits per-second received at the
client. The packet loss rate is the fraction of lost packets. In addition, we
derive PSNR by first comparing the pre-inserted color bars in video frames to
match the frames at the server- and client-side YUV files. For lost frames, we
replay the previously decoded frames at the client side to mimic the basic error
concealment approach. We then compute the PSNR values accordingly.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 9: Sample results from Movie/TV: (a) estimated bandwidth,
(b) target bitrate, (c) encoding bitrate, (d) aggregation threshold,
(e) packet loss rate, and (f) quality in PSNR.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 10: Sample results from Application:
(a) estimated bandwidth, (b) target bitrate, (c) encoding bitrate,
(d) aggregation threshold, (e) packet loss rate, and (f) quality in PSNR.
Functionality of our adaptation algorithm.
We fix the distance between the server and the client at 4 meters, which is inline
with one of the most popular screencast scenarios: streaming presentation slides
over a Wi-Fi network with a projector in a conference room. The server plays
different content types and streams captured videos to the client for 3 minutes
in each experiments. Figure 9 shows the sample results from
the Movie/TV category (Talk Show), which is more complex. We first plot the
estimated available bandwidth over time from Movie/TV in
Figure 9(a). The target encoding bitrate is then decided by a
function of [ˉc]a, and the values are illustrated in
Figure 9(b). Figure 9(c) shows the
actual encoding bitrates at the server. The average encoding bitrate of
Movie/TV is about 7.3 Mbps, which is close to the target bitrate.
Figure 10 gives similar results from the Application
category (PowerPoint). However, we observe that its average encoding bitrate is
0.6 Mbps lower than the target bitrate.
A deeper investigation indicates that it is because Applications have lower
complexity and are easier to be compressed.
The dynamics of the aggregation threshold.
Figure 9(d) plots the trend of the adaptive aggregation
threshold over time. In our experiments, the initial aggregation threshold is
empirically set to 0.36. The aggregation threshold in
Figure 9(d) tends to be continuously decreased until t = 90 s. The aggregation threshold continuously decreases because the throughput
approaches the target bitrate and the packet loss rate is zero. This indicates
that the aggregation threshold should be reduced for higher bandwidth
estimation. Figure 9(e) plots the packet loss rate over time
from Movie/TV. Comparing Figures 9(d)
and 9(e), the threshold chooser is aware of high packet loss
rate at t = 90 s, and increases the aggregation threshold accordingly.
Figures 10(d) and 10(e) show similar
behavior at t = 135 s. The threshold chooser adjusts the aggregation
threshold to accommodate to the current network environments, and helps to
quickly recover from the video quality degradation due to high packet loss
rate.
(a)
(b)
(c)
(d)
Figure 11: Performance at different distances: (a) estimated bandwidth, (b) quality in PSNR, (c) frame loss rate,
and (d) packet loss rate.
Implications of distances on performance.
Screencasting is more likely to happen in conference rooms, homes, and classrooms.
Thus, we consider
the distance between the server and the client at 1, 2 , 4, and 8 m. We plot the
average of the estimated available bandwidth with different content types at
diverse distances in Figure 11(a). This figure shows that the
estimated available bandwidth has a trend of decreasing as the distance increases.
This is consistent with our intuitions. Figure 11(b) plots
the video quality at different distances. This figure shows the scalability of
our proposed rate adaptation: the PSNR values are always higher than 40
dB2.
Besides, the PSNR values of Applications are higher than the other 2 categories
by about 15 dB on average. This can be attributed to the lower complexity of
Applications. Figures 11(c) and 11(d)
report the frame loss rate and the packet loss rate, respectively. We note that
frame loss rate is usually higher than packet loss rate. This is because each
frame is typically segmented into multiple packets and losing any packets (of a
frame) leads to a frame loss. Movie/TV usually suffers from higher loss rate
than others, because it has higher complexity, and hence produces more packets.
(a)
(b)
Figure 12: Comparisons between static and adaptive bitrate: (a) frame loss rate and (b) estimated bandwidth.
Effectiveness of our rate adaptation mechanism.
We conduct experiments to compare our proposed algorithm to static bitrate
configurations: (i) 1 Mbps and (ii) 12 Mbps at 8 m. We report the frame loss
rate and PSNR in Figures 12(a) and 12(b).
We observe that 12 Mbps configuration suffers from high frame loss rate, thus
leads to inferior video quality except for Applications. A closer look at the
encoding bitrate of Applications video reveals that the contents are encoded at
6.6 Mbps on average. Therefore, even when the available bandwidth reduces, the
video quality of Applications may not be affected too much. Last, although
conservatively setting the static bitrate at 1 Mbps avoids high frame loss
rate, it also results in lower video quality. In particular, our adaptation
algorithm outperforms the 1 Mbps configuration by at least 5 dB in PSNR in our
experiments.
7 Conclusion
Although screencasting is becoming increasingly popular, researchers and
developers resort to ad-hoc design decisions, because the performance
measurement of screencast technologies has not been throughly studied. In this article,
we have developed a comprehensive measurement methodology for screencast technologies
and carried out detailed analysis on several commercial and one open-source
screencast technologies.
The presented methodology is also applicable to other and future technologies,
such as screencast products and non-Intel hardware codec SDKs.
Our comparative analysis shows that all screencast technologies have advantages
and disadvantages, which in turn demonstrates that the state-of-the-art
screencast technologies can be further improved by making educated design
decisions, based on quantitative measurement results. Exercising different
design decisions using commercial screencast technologies is, however,
impossible, because these technologies are proprietary and closed. We have
presented how to customize GA for a screencast platform, which enables
researchers and developers to perform experiments using real testbed when
facing various design considerations. Several sample experiments related to
actual decision considerations have been discussed, e.g., we have found that
hardware video encoders largely reduce the CPU usage, while slightly increase
the GPU usage and end-to-end latency. We have acknowledged the importance of
well-designed rate adaptation mechanisms for dynamic wireless networks, and
designed, implemented, and evaluated a practical rate adaptation algorithm on
GA. The lessons learned in our research shed some light on building next
generation screencast technologies.
Future work. The current article can be extended in several
directions. For example, we plan to study more ubiquitous multi-display usage
scenarios, where several handheld device users share these displays in a nature
fashion. Another possible extension is an alternative transport protocol, such as
Quick UDP Internet Connections (QUIC) [ RoskindRoskind2012], which leverages Forward Error Correction (FEC) to mitigate
packet losses.
Figure 13: Performance under the default settings: (a) bitrate, (b) latency, (c) frame loss, and (d) quality in PSNR.
We report the results under the default configurations (see
Table III). Each experiment lasts for 9 minutes 18
seconds, with 33,480 video frames. For each screencast technology, we
first calculate the bitrate, latency, and video quality of individual
video frames rendered by the receiver (i.e., lost frames are not
considered) and then compute the mean and standard error of the metrics
across all the video frames. We also derive the frame loss rate of
each experiment. We plot the results in Figure 13, and
make several observations. First, AirPlay and Miracast both lead to
high bitrate and low latency, while Miracast achieves much lower frame
loss rate. Second, although Chromecast incurs very low bitrate, it
suffers from high latency and high frame loss rate. Third, Splashtop
and MirrorOp achieve similar bitrate and video quality, but Splashtop
leads to lower latency and frame loss rate. Fourth, screencast
technologies other than Miracast lead to roughly the same video
quality. Last, GA leads to low bitrate, low latency, good video
quality, but slightly higher frame loss rate. Figure 13
reveals that most screencast technologies have some weaknesses, e.g.,
AirPlay and Miracast incur higher bitrate, Chromecast and MirrorOp
suffer from high latency, and Chromecast also results in high frame
loss rate. In contrast, Splashtop and GA perform fairly well in terms of
all metrics. GA's imperfect frame loss can be partially attributed to the default
UDP protocol it adopts, and we will take a closer look at the
implications of switching to TCP protocol in
Section 5.2. We omit the figure of quality in
SSIM, because it shows almost identical trends as PSNR.
Figure 14: Performance under different frame rates: (a) bitrate, (b) latency, (c) frame loss, and (d) quality in PSNR.
Figure 15: Performance under different bandwidth: (a) bitrate, (b) latency, (c) frame loss, and (d) quality in PSNR.
Figure 16: Performance under different delays: (a) bitrate, (b) latency, (c) frame loss, and (d) quality in PSNR.
9.2 Performance under Diverse Workload and Network Conditions
We vary frame rates to generate different amounts of traffics. We plot
the performance results in Figure 14, which leads
to several observations. First, AirPlay and Miracast incur higher
bitrates at 15 fps than other screencast technologies at 30 and 60 fps.
Second, higher frame rates generally result in higher latencies and
frame loss rates, due to saturated network resources. Third, frame
rates impose minor impacts on video quality.
Next, we configure different network conditions in terms of network
bandwidth and delay, and plot the observed screencast performance in
Figures 15 and 16,
respectively. We make some observations on
Figure 15. First, AirPlay, Chromecast, and Miracast
adjust the bitrate according to the available bandwidth, while GA,
MirrorOp, and Splashtop maintain the same bitrate independent to the
bandwidth. Second, Chromecast and MirrorOp suffer from excessive
latency, while other screencast technologies perform reasonably well.
Third, Miracast results in seriously degraded video quality with lower
bandwidth, which can be attributed to its over-aggressive bitrate
usage. On the other hand, we also make some observations on
Figure 16. First, AirPlay and Splashtop are
sensitive to delay, because they both reduce the bitrate as the delay
increases. Second, higher delay generally results in higher latency and
frame loss rate, while GA and Miracast outperform other screencast
technologies in these two aspects. Last, only AirPlay and MirrorOp
suffer from degraded video quality under longer delay, which we suspect
may be partly due to the TCP protocol they adopt (c.f. Table IV).
Figure 17: Ranks of different screencast technologies under different
conditions. Ticks closer to the origins represent lower ranks (worse
performance).
9.3 Performance Ranking
We study the ranking of these screencast technologies under different
conditions. In addition to the default condition, we define high
frame rate by increasing the frame rate to 60 fps, lossy network
by setting the packet loss rate to 2%, high delay network by
setting the network delay to 200 ms, and low bandwidth network by
setting the bandwidth to 4 Mbps. For each condition, we compute the
performance metrics, and rank the screencast technologies on each
metric independently3. We then plot the results
in the form of radar chart in Figure 17, where each of the
four axes reports the ranking of screencast technologies in terms of a
particular performance metric. This figure reveals that: (i) Splashtop
performs the best and is balanced in general, and it is never ranked the
last in all aspects, (ii) AirPlay and GA perform reasonably well in all
aspects, trailing Splashtop, and (iii) Chromecast, Miracast, and
MirrorOp lead to inferior performance in general. The figure also
reveals potential limitations of individual screencast technologies.
For example, under lossy network conditions, GA results in lower video
quality and higher latency, which can be mitigated by adding error
resilience tools to it.
9.4 Tolerance Ranking
We perform tolerance analysis to quantify how much impact each
parameter incurs on each performance metric with diverse screencast
technologies. For each screencast technology, we vary a parameter while
fixing all other parameters at their default values. We repeat the
experiment multiple times, and compute the mean performance of each
experiment. For each metric, we compute the tolerance,
which is defined as one minus the range (i.e., the difference between
the maximum and minimum) over the minimum. If the tolerance
is smaller than 0, we set it to be 0. Larger tolerance (closer to 1)
means more stable performance; smaller tolerance (closer to 0)
indicates that the particular parameter affects a
performance metric more prominently.
Figure 18: Tolerance of different screencast technologies to different
workload and network conditions. Lower tolerance (closer to the
origins) means higher vulnerability to dynamic environments.
We report the tolerance ranks of latency, frame loss rate, and video
quality in Figure 18, where the five axes of each
radar chart represent the impact of the five workload/network
parameters, and the ticks closer to the origins indicate lower tolerance due
to the particular parameter associated with the axis. We make several
observations. First, the latency achieved by MirrorOp does not change
under different parameters, while latency achieved by other screencast
technologies is vulnerable to at least one parameter. For example,
AirPlay is vulnerable to changes in network delay and packet loss rate,
and Chromecast is vulnerable to changes in bandwidth. Second, the frame
loss rates achieved by AirPlay and Splashtop are vulnerable to changes
of all parameters, while the frame loss rates of all screencast
technologies are vulnerable to changes in the frame rates. Third, most
considered screencast technologies achieve stable video quality, except
Miracast and MirrorOp, which are sensitive to bandwidth and network
delay, respectively. In summary, the frame loss rate is the most
vulnerable metric, while all screencast technologies handle video
quality quite well. Overall, MirrorOp performs the best, and GA may be
enhanced to better adapt to changes in frame rate and delay.
Nevertheless, we need to add that the degree of tolerance needs to be
interpreted together with the performance in the evaluation of
screencast technologies. For example, MirrorOp performs the best in
terms of tolerance. We believe that this is mainly due to its much
longer latency (see Figure 13), so maintaining a nearly
constant latency and frame loss rate is relatively easy compared to
screencast technologies with shorter latencies (such as AirPlay, GA,
Miracast, and Splashtop). Thus, tolerance should be the next thing we
are looking for only after the performance achieved is satisfactory,
and we cannot conclude that one screencast technology is better than
others solely based on tolerance comparisons in
Figure 18.
Footnotes:
1. dummynet is a
network emulation tool, initially designed for testing networking
protocols. It has been used in a variety of applications, such as
bandwidth management.
2. The PSNR value of our proposed GA in Figures 11(b)
and 12(b) are higher than that in Figure 4
because these two figures are from two independent experiment setups.
3. We use PSNR as the video quality metric,
but SSIM leads to nearly identical ranking in our experiments.
Sheng-Wei Chen (also known as Kuan-Ta Chen) http://www.iis.sinica.edu.tw/~swc
Last Update September 28, 2019