In recent disasters, the web has served as a medium
of communication among disaster response teams,
survivors, local citizens, curious onlookers, and
zealous people who are willing to assist victims
affected by disasters. To encourage and speed up
information dissemination, the availability and
convenience of use are normally the top concerns in
designing disaster response web services, where a
design of free-formed inputs without access control
is commonly adopted. However, such design may
result in personal information disclosure and
privacy leakage.
In this paper, using a case study of a real-life
disaster response service, the MKER (Morakot Event
Reporting) forum, we show that the disclosure of
personal information and the resulting privacy
disclosure is indeed a serious problem that is
currently happening. In our case, we have
successfully mapped 1,438 unique cell phone
numbers and 1,383 unique addresses to individuals
using an automated method, not to mention the much greater invasion of privacy
that could be effected by manual analysis of the messages
posted on the forum. To resolve
this issue, we propose several means to mitigate and
prevent the mentioned privacy leakage on disaster
response services from being happened.
crisis informatics; disaster management; disaster response; privacy leakage; situation awareness; user privacy
1 Introduction
As the web technology is now pervasive due to the
ubiquity of the Internet, it has also expanded the
reach of disaster sociology that people can easily
participate in disaster situation response and
relief efforts, no matter whether peer-to-peer
communication and public participation is
concerned [1,[2,[3].
In recent disasters, the web has served as a medium
of communication among disaster response teams,
survivors, local citizens, and even curious
onlookers. Furthermore, it provides opportunities
for zealous people who would like to assist victims
affected by
disasters [1,[4] and
actively engage in the creation of valuable
information rather than being passive information
consumers.
When a disaster occurs, all individuals respond to
it in their own ways. The communication needs
during the crisis response stage are extraordinary
in both amount and forms. Synchronous
communication, such as phone conversation, is always
heavily used among emergency response teams and
between people in the disaster area and those
outside the area. Meanwhile, asynchronous
communication via the web is equally important. For
example, people who are in the vicinity of a crisis
can provide firsthand situation updates to the
public via the web, and people who cannot reach
acquaintances in the affected area can call for
help on the web.
To fulfill the asynchronous communication needs
during crisis response, websites are set up for
various purposes (c.f.,
Section II-A) like situation
overview, relief information, and donation
management. Such websites are especially useful when
certain information, such as looking for missing
people, needs to be broadcasted to unspecific
audience and real-time responses are not necessary
or feasible.
Although disaster response web services possess an
asynchronous communication nature, to encourage and
speed up information dissemination, the availability
and convenience of use are normally the top concerns
in providing such services. As a result, these
sites mostly provide information for public access
without role-based access control which is commonly
used by other websites. Furthermore, they usually
accept free-formed inputs from the public
because there can be uncountable types of
information needed to disseminated via the websites.
Because efficiency is assigned much higher
weight than privacy on disaster response services,
we anticipate that personal information
disclosure and the resultant privacy leakage
would be a serious issue and worth of investigation.
In this paper, using a case study based on a
real-life disaster response service, the MKER
(Morakot Event Reporting) forum, we show that the
disclosure of personal information and the resulting
privacy disclosure is indeed a serious problem that
is currently happening. In our case, we
successfully mapped 1,438 unique cell phone
numbers and 1,383 unique addresses to individual
persons using a fully automated approach, not to mention the much greater invasion of privacy
that could be effected by manual analysis of the messages
posted on the forum. The leaked privacy information, once
it becomes available for malicious use, would be
disastrous given the huge amount of information
on a single site.
Our contribution in this paper is three-fold:
We identify the privacy risks that are caused
by users' communication on disaster response web
services.
As a demonstration, we extensively analyze the
MKER
forum1, a
disaster response forum set up for Typhoon Morakot
in Aug 2009, to understand how people utilize such
services with free-formed inputs and how personal
information is disclosed. We also present
automated analysis techniques to quantify privacy
leakage on the forum.
We propose several solutions to mitigate and
prevent such privacy leakage from being happened.
The remainder of this paper is organized as follows.
Section II provides a review for common
disaster response web services and related works. In
order to illustrate the privacy leakage risks, we
introduce the MKER forum and the Typhoon Morakot
that was the root cause of the former in
Section III. In
Section IV, we present how we
analyze the messages publicly available on the MKER
forum and how privacy leakage is generated. In
Section V, we propose several means for
mitigating and preventing the forementioned privacy
leakages on current and future disaster response
sites. Finally, Section VI draws
our conclusion.
2 Background
Web disaster response services have proved
valuable in that they provide efficient means to
help gather real-time updates from social reporters
(some of them may be witnesses of crisis) who are at
vantage points and able to access firsthand
information. Such crowdsourcing model manifested its
usefulness and unprecedented role in recent Haiti
earthquake response [5]. In this
section, we first provide an overview of various
disaster response services on the web and then
review related works that inspected the use of web
services in recent disasters.
2.1 Disaster Response Web Services
When facing a disaster, people use
information from any kind of source as long as it
satisfies their needs and informs their
actions [6]. In this subsection, we
categorize commonly seen web services designated for
disaster responses.
2.1.1 Information Portal
Sahana [7] is an open-source
disaster management system that addresses the common
coordination requirements targeting relief
operations and rehabilitation efforts. It was
officially deployed by the Center for National
Operations in Sri Lanka as a part of their official
portal in 2005. Since then, it has been used in
response to the 2005 Kashmir earthquake, the 2006
Guinsaugon mudslides in Philippines, and the 2006
Yogyakarta earthquake in Indonesia.
The Disaster Portal created by Project
RESCUE [8] is an instance of
the web portals for bidirectional communication
between response parties and the public during
emergency situations. It features a situation
overview map, emergency shelter status, donation
management, and services for family reunification.
It was officially debuted by the City of Ontario
during the Southern California wildfires in Oct
2007 [9].
2.1.2 Social Media Web Service
A variety of social media web services have proved helpful in disaster responses, here we
exemplify their uses in recent disasters:
Social networking websites (such as Facebook
and MySpace): Facebook is one of the instances that
was used as an information gathering center during the
2007 Virginia Tech and the 2008 Northern Illinois
University shootings [10].
Microblogging services (such as Twitter and
Plurk): Broadcasted communication via Twitter during
the 2009 Red River Floods has been shown its
efficiency and efficacy in enhancing situational
awareness of the public [11,[12].
Web publication tools (such as blogs and
wikis): Wikipedia was used to generate and
disseminate information during the 2007 Virginia
Tech shooting. The complete list of the 32 victims
it generated was even before the university released
the information [4].
Data mashup services (such as Google Maps,
AlertMap, and FlickrVision): Google Maps is one of
the examples which was exercised by social reporters
in the affected area to report and disseminate
updates about the disaster during the 2007
California wildfires [13,[3].
Discussion boards and forums: Web forums have
been used for coordinating disaster response and
relief efforts. For example, during the 2008 Sichuan
earthquake in China, netizens used a popular online
discussion forum, Tianya
Community2,
to share and disseminate information about the
disaster [14].
2.2 Related Work
Starbird et al [12] analyzed 7,183
Twitter messages (tweets) that took place during the
flooding of the Red River Valley in the US and
Canada in March and April 2009 to investigate the
form and content of Twitter communications regarding
the hazard. Their analysis indicates that around
10% tweets were original, over a quarter of the
tweets were synthetic (i.e., original tweet messages
incorporating outside knowledge such as news from
mass media and geographical facts), and around three
quarters of the tweets were derivative information.
In addition to the Red River flooding event, the
same authors further examined the tweets during the
Oklahoma Grassfires of April
2009 [11] and identified
information that may contribute to enhance
situational awareness about the current disaster.
Torrey et al [15] observed the online
communication responded to 2005 Hurricane Katrina in
four online communities (two blogs and two forums)
to understand how people who were willing to donate
goods for disaster victims communicated online and
facilitated the distribution of donations. They
found that online communities played an important role
in both information access and trust development in
disaster relief, where large discussion forums tend
to be more sustainable in disaster relief than small
community blogs.
Moreover, during the 2007 California wildfires,
residents in the affected area used Google Mashup
and social media such as Twitter to report and
disseminate updates about the
disaster [13,[3].
Meanwhile, popular social networking sites such as
Facebook were used as information gathering center
during the 2007 Virginia Tech and the 2008 Northern
Illinois University shootings [10].
Additionally, during the 2008 Sichuan earthquake in
China, netizens used the Tianya Community to share
and disseminate information about the
disaster [14].
Although the usage of social web services on
disaster responses has been extensively studied, the
privacy issues in using such web services have
rarely been mentioned. Herold [16] has
discussed the concerns regarding sensitive
information disclosure due to system malfunction or
insecure data transmission. Motivated by a similar
concern, we instead investigate the privacy
disclosure risks during the communications on
disaster response forums and propose countermeasures
to such risks without sacrificing the capability of
the web services.
3 Typhoon Morakot and The Forum
3.1 Typhoon Morakot
Figure 1: The movement track of
Typhoon Morakot during Aug 3, 2009 and Aug 11,
2009 [17]
Figure 2: The number of deaths and people missing
caused by Typhoon Morakot in each county of
Taiwan
Figure 3: The accumulated precipitation
brought by Typhoon Morakot during Aug 5 and Aug 10,
2009
In the traceable history of Taiwan Typhoon
Morakot3 was the most severe typhoon in terms of
casualty and injury, and produced a record-breaking
level of rainfall. Early on Aug 7, 2009, the storm
attained its peak intensity with the wind speed of
140 km/h (equivalent to 87 mph) according to
JMA4.
Approximately 24 hours later, the storm emerged back
over water into the Taiwan Strait and weakened to a
severe tropical storm before making a landfall on
China on Aug 9 (see Figure 1).
During the four-day period (i.e., from Aug 7 to Aug
10) that Typhoon Morakot struck Taiwan, the
heavy rainfall in the mountain area was over 1,500
millimeters in 24 hours and the accumulated
precipitation over this period was more than 3,000
millimeters (see
Figure 3). The
maximum cumulative rainfall depth in three
subsequent days in the period was approaching the
world's highest rainfall
record5.
Table 1: A
comparison between the Typhoon Morakot and annual
precipitations observed in weather stations in the
disaster area
Annual
Daily
Daily
Daily
Daily
Morakot
Morakot
Rainfall
08/07
08/08
08/09
08/10
08/07-
vs
(mm)
(mm)
(mm)
(mm)
(mm)
08/10
Annual
Chiayi
Alishan
3,910
420
1,161
1,166
218
2,965
76%
Pingtung
Sandimen
3,884
745
1,402
394
332
2,872
74%
Chiayi
Jhuci
3,801
556
1,185
877
156
2,775
73%
Kaohsiung
Taoyuan
4,086
501
1,283
583
423
2,790
68%
Kaohsiung
Liouguei
3,138
236
1,178
696
351
2,461
78%
Chiayi
Fanlu
3,437
708
815
601
79
2,202
64%
Chiayi
Dapu
2,749
482
1,214
458
3
2,156
78%
Kaohsiung
Jiasian
2,861
400
1,072
345
203
2,020
71%
Nantou
Sinyi
3,254
170
717
909
134
1,929
59%
Kaohsiung
Maolin
3,152
252
743
230
179
1,404
45%
Pingtung
Wutai
2,898
206
580
208
165
1,160
40%
Kaohsiung
Cishan
2,365
91
620
128
85
924
39%
The heavy rainfall brought by Typhoon Morakot caused
flooding in a number of areas which spanned a sum of 400 square
kilometers.
The regions with heavy precipitation and flooding were
mostly concentrated in the southwest area of Taiwan,
where a number of stations in this area reported to
observe more than 70% of the annual
precipitations in merely three days
(Table I). Among the affected
regions, the most severely damaged one was the
Shiaolin village in Kaohsiung County. The
torrential rainfall caused a large-area landslide,
which wiped out the whole village and killed around
500 people [18].
To summarize, Typhoon Morakot caused various damages
such as landslide, debris flows, fallen rocks,
overflowed levees, building collapse, and road and
bridge destruction. There were 677 deaths and 22
people missing due to the hazard (see Figure
2). Economically, an estimated
sum of loss on merely agriculture was more than
USD$5.3 billion.
3.2 The MKER Forum
Figure 4: The screenshot of the
MKER (Morakot Event Reporting)
forum
Figure 5: The screenshot of the "Making a
New Report" interface of the MKER
forum
When Typhoon Morakot was hitting Taiwan, a number of
disaster response web services were set up for
various purposes, such as information portals,
situation updates, and donation management. Among
these sites, the MKER (Morakot Event Reporting)
forum6 was
online on Aug 9, 2009 and designated to be an
information exchange site for the Morakot disaster
with a discussion-board-like interface. Since its
launch, people with different roles gathered around
this site for three primary purposes: 1) asking for
relief support, 2) seeking for missing people, and
3) posting situation updates of people, goods,
and any information related to the disaster.
Figure 4 shows a screenshot of
the MKER forum. The information posted on the forum
is organized in "threads" (i.e., records), where a
thread contains the following fields: 1) the county
and the detailed address associated with an event,
2) contact information of the thread originator, 3)
description of the event, 4) help in need (if
applicable), and 5) responses (follow-ups) to the
thread.
Note that when a user initiates a new thread, he
provides only the first four fields (c.f., the
"Submit a New Report" interface in
Figure 5), as the Response field
is intended to be supplied by others, which can
contain any number of replies from different
respondents. To reduce duplicate information being
reported, thread originators were encouraged to use
the keyword search functionality before initiating a
new thread. In this way, if the information to be
posted is related to an existing thread, the user
can append follow-up messages to the thread via the
"Add Response" interface.
During Aug 7 to Oct 22, people posted a total of
4,315 threads and 12,244 responses on the MKER
forum, where 85% of the threads were posted between
Aug 9 and Aug 14, as shown in
Figure 6. Based on the graph,
the forum received 1,030 threads on Aug 11, which
was the first day after Typhoon Morakot struck
Taiwan. To illustrate what the information on the
forum look like, we provide five example threads in
Table II. In the first example,
the originator could not contact with his/her three
acquaintances; therefore asked about any news
about them. The second example was similarly
intended to look for two missing people, and in this
case, someone responded to this thread that both
people were safe.
To acquire an overview of the information posted on
the forum, we randomly sampled 500 threads and
manually classified them by their intentions. The
sampled results indicate that 38% of the threads
were disaster situation updates, 28% of the threads
were used for inquiring the status of certain
people, and the remaining 34% of the threads were
asking for relief.
Figure 6: The forum thread count timeline
Table 2: Examples of posts on the disaster relief forum we analyzed
Location
Contact
Description
Relief/Help in Need
Response
[08/13 22:45] Can any
Namaxia Township
Has anyone seen them?
Namaxia Township People
Minsheng Vil.
If so, plz contact me,
who know them contact me?
thanks!
Or can anyone who know
their family reply to this thread?
No.***, Nanheng Rd.,
phone #
Looking for
[08/15 01:06]
Launong Vil.,
0938-******
***Pan (28-year-old Male)
Both of them are safe!
Liugui Township,
0910-******
***Pan (43-year-old Female)
All people live there are safe.
Kaohsiung County
Ms. Pan
Please send more
[08/14 01:21] I agree with this
soldiers/ rescuers
suggestion. Hope Government
Taoyuan Township,
home was flooded
to the disaster area
officials can see this post.
Kaohsiung County
to help search bodies
[08/14 01:24] I've contacted w/
and assist survivors
Mr. Chou at the rescue center,
Kaohsiung. 07-******ext****
Does anyone know if
[08/13 12:33] Looking for
Xinfa Vil.,
Mr. Chang
***Chien is safe?
***Lin. who lives in
Liugui Township
0911******
If you have any info
No.**, Xinkai, Liugui Township.
please contact me.
Anyone with any info
Thanks!
please contact 0987******
[08/11 23:13] Urgent need for
Dabangu Vil.,
0989******
power cut, road closure
relief,
food, water, living goods.
Chiayi County
in the village.
living goods
[08/11 23:16] contact name is
Ms. Yang, Thanks.
† For privacy concerns, we have
partially replaced sensitive information with *.
3.3 Privacy Leakage Potentials
From the example threads in
Table II, it can be observed that
personal information on the MKER forum was completely disclosed to the
public without any protection.
Such information exposure seems unavoidable because
in many cases, the thread initiator would like
anyone who has the requested information to
contact him/her directly. In such critical conditions,
mobile and landline phones are much more preferred
than Internet communication tools such as e-mail.
In addition, to ask for status about certain people,
it is usually much more helpful, sometimes
mandatory, to provide the name and residential address,
as well as of the gender and age, of each person who is to be found.
As of the writing of this paper (Apr 2011), Typhoon Morakot has left Taiwan more than 1.5 year ago,
and there have not been any post activities on the
MKER forum. However, all the information posted
during the Morakot crisis is still available to
every Internet user (and web crawlers) today.
Actually, the same situation can be observed on many
other disaster response websites for Typhoon
Morakot, which we believe due to reasons including
history archival and memorial purposes. This
further deteriorates the privacy leakage risks and
provides such disaster response websites to
malicious users as resourceful places for digging
others' privacy information. Motivated by these
observations, we will quantify the degree of privacy
leakage on the MKER forum and propose several means
to resolve this issue in the rest of this paper.
4 Privacy Leakage on the MKER forum
In this section, we analyze the threads on the MKER
forum and reveal privacy leakage based on those
information. We begin by defining and extracting
personal information from the threads, examine how
such information was disclosed by users, and
conclude this section by providing a measure of
privacy leakage on the forum.
Table 3: The degree of personal information
disclosure in each of the Location, Contact,
Description, and Response fields
Location
Contact
Description
Response
Overall
Name
19
0.4%
1,187
27.5%
1,108
25.6%
1,467
34%
2,469
57.2%
Contact
2
0.1%
2,377
55.0%
282
6.5%
1,419
32.8%
2,980
69.0%
Address
3,420
79.2%
83
1.9%
1,412
32.7%
1,631
37.8%
3,769
87.3%
Either
3,428
79.3%
2,521
58.4%
2,138
49.5%
2,484
57.5%
4,115
95.3%
4.1 Personal Information Disclosure
To analyze the free-formed comments on the MKER
forum, we define the following three categories of
personal information:
Name: An identifiable person's name can
be either a full name (family name and first name),
a first name, or a title followed by a last name,
e.g., Ms. Cartmon.
Address: An identifiable address can
be either a county name, a village name, or a street
name.
Contact: An identifiable contact can
be either a cell phone number, a landline number, or
an e-mail address.
We developed a program which can automatically extract
all the identifiable personal information above from
publicly available users' comments on the MKER
forum. The identification of person names and
addresses are based on table lookups,
while the identification of contacts is based on
regular expressions, as both phone numbers and e-mail
addresses have rather strict formats. As a result,
we have identified 2,866 unique personal names,
2,556 unique addresses, and 2,903 unique
contacts from the 4,315 threads on the forum.
Among the threads, 4,115 (95%) of them contain
at least one category of personal information, while
1,920 (44%) of them contain all the three
categories. The numbers of threads containing all
the combinations of identifiable personal
information are shown in Figure 7.
We further analyze the relationship between the
roles a person plays during disaster response and
the tendency his name to be disclosed on the web. By
a manual analysis of a random sample of 500 personal
names, we found that 65% of personal names belong
to people affected by the disaster, 17% belong to
the people who made the comments on the web, and the
remaining 18% belong to third parties, such as fire
fighters and people in emergency response teams.
Figure 7: The Venn diagram of the
number of threads which contain personal names,
addresses, and contacts
Table 4: A breakdown of the disclosure analysis of
each personal information category in each field on
the MKER forum
Location
Contact
Description
Response
Overall
Total
Unique
Total
Unique
Total
Unique
Total
Unique
Total
Unique
Full Name
12
8
574
337
2,461
1,533
3,047
1,050
6,094
2,296
First Name
4
4
89
50
212
117
733
210
1,038
314
Title & Last Name
6
5
673
156
277
92
1,153
178
2,109
256
Total
22
17
1,336
543
2,950
1,742
4,933
1,438
9,241
2,866
Location
Contact
Description
Response
Overall
Total
Unique
Total
Unique
Total
Unique
Total
Unique
Total
Unique
County
4,623
108
28
11
252
78
316
84
5,219
157
Village
4,522
269
58
33
2,191
216
3,126
208
9,897
366
Street
2,058
315
40
24
813
198
1,755
295
4,666
492
Village & Street
2,040
277
21
12
2,750
762
3,990
854
8,801
1,541
Total
13,243
969
147
80
6,006
1,254
9,187
1,441
28,583
2,556
Location
Contact
Description
Response
Overall
Total
Unique
Total
Unique
Total
Unique
Total
Unique
Total
Unique
Cell Phone
1
1
2,509
1,640
316
198
2,174
778
5,000
2,171
Landline
2
2
372
303
166
139
820
342
1,360
679
Email Addr
0
0
38
24
33
20
24
17
95
53
Total
3
3
2,919
1,967
515
357
3,018
1,137
6,455
2,903
4.2 Information Disclosure Analysis
In Table III, we summarize the degree
of personal information disclosure in each of the
Location, Contact, Description7,
and Response fields, where the first three fields
are filled by thread initiators, and the Response
field can be appended anytime by the thread
initiator or others. According to the table, we
have successfully identified one or more addresses
in the Location field of 80% of the threads, and
did so for contacts in the Contact field of 55%.
Interestingly, the comments in the Description field
allow us to extract personal names from 25% of and
addresses from 32% of the threads, which imply that
thread initiators often mentioned personal names and
locations in the description of events or needs. On
the other hand, while the Response field contains
relatively higher ratios of personal names (34%)
and addresses (38%), the field also contains
contacts in 33% of the threads, which indicates
that the respondents to a thread often included
their own contact information for further
communication.
We further provide a breakdown of the disclosure
degree of each personal information category in each
field in Table IV. From the
table, we identify that personal names appeared more
frequently in the Description and Response fields,
which supports our analysis that around 65% of the
disclosed personal names belong to people who were
affected by the disaster, as the names of people who
made the comments tend to be left in the Contact
field. Also, full names were used most of times as
unique personal identification was extremely
important in such emergency conditions. Similarly,
more than half of address occurrences were
identified in the Description and Response fields
though a Location field is provided, as users tended
to provide more location-relevant information in the
event description and in follow-up communications.
The identifiable contact information is primarily
cell phone numbers (74%) and landline numbers
(23%). This again shows that people prefer to
communicate synchronously via phones for emergent
and critical purposes.
4.3 Privacy Leakage
Table 5: The occurrence numbers of successful
bindings between personal names, addresses and
contacts (cell phones, landline phones, and email
addresses)
Landline
Cell Phone
Email
Address
Overall
Total
Unique
Total
Unique
Total
Unique
Total
Unique
Total
Unique
Full Name
124
89
1,642
596
7
7
2,537
1,100
4,310
1,792
First Name
26
15
311
124
0
0
175
98
512
237
Title & Last Name
120
85
1,349
718
23
8
345
185
1,837
996
Total
270
189
3,302
1,438
30
15
3,057
1,383
6,659
3,025
Having extracted the personal information classified
in three categories, we are able to continue to
gauge the leakage of privacy information on the MKER
forum. Here we refer to privacy information as the
binding of a personal-identifiable information
(PII) and its associated personal information. More
specifically, we see a successful binding
between a personal name and a corresponding address
or contact as an instance of privacy leakage.
We perform the bindings between the extracted
personal names and personal information using the
positions of their appearance in each thread.
Specifically, if a personal name appears in the same
field as a certain contact (or address) and the
distance between them is within 30 characters, the
contact (or address) is considered bound to the
personal name. If multiple names are simultaneously
bound to the same personal information, only the
closest name (in terms of character distance) is
considered successfully bound.
We list the number of occurrences of privacy leakage
in Table V. The table shows that we
are able to infer 1,438 person-cell-phone pairs
and 1,383 person-address pairs based on the
extracted personal information, where the bindings
for landline numbers and e-mail addresses were much
less mainly due to the occurrences of the two
contact categories are also much fewer. Among the
names successfully bound to certain information,
60% are full names, while 33% are titles followed
by respective last names. The majority disclosures
of full names and cell phones together make the
privacy leakage a serious issue as the full name and
cell phone number of an individual are sufficient to
perform fraudulent and other malicious activities
against him/her.
We also summarize the proportions of addresses and
the three categories of contact information with and
without name bindings in Figure 8. The
graph shows that around half of addresses are
successfully bound to individuals, while around two
thirds of cell phone numbers are successfully bound.
On the contrary, merely 28% of landline numbers are
bound to personal names. We believe this can be
attributed to the fact that many of the landlines in
users' comments are owned by governmental or
non-governmental organizations related to emergency
rescue or relief, such as Red Cross branches,
temples, and local police offices.
Figure 8: The percentages of addresses and the
three categories of contact information with and
without name bindings
To sum up, based on a total of 5.2 MB of users'
plain-text comments on the MKER forum, we infer
3,025 privacy leakage incidents using a fully
automated method with simple table lookups and
regular expressions. If a malicious user aims to
exploit the dataset, he can definitely extract much
more "useful" privacy information via manual
analysis. For example, ages and genders of people
are usually provided in requests made to query the
status of certain people. Moreover, we see
frequently in users' comments that the relationship
between the commenter and the persons he/she mentioned
is revealed (e.g., uncle, aunt, or grandchild);
sometimes the commenter even provided the complete
member list of a family. As we merely use the MKER
forum as an example using automated analysis, the
overall privacy crisis due to crisis response
websites would be devastating severe if we consider
the number of similar services, the number of crisis
events, and the number of malicious users around.
5 Solutions to Privacy Leakage
In this section, we discuss potential solutions
to the identified privacy leakage due to crisis
responses on the web.
5.1 Remedy to Current Services
For services that are currently running, such as the
MKER forum, firstly, the web server should prevent
the site's content from being crawled by search
engines. A web service can adopt the Robots
Exclusion
Standard8
by putting robots.txt under the root
directory of the web server to instruct web spiders
not to cache the site's content. However, the
Robots Exclusion Standard is informational rather
than an enforcement, a web spider or a malicious
user can simply ignore robots.txt and
perform data crawling. Secondly, the site owners
can mask all personal information, such as contacts
and addresses, using table lookup and regular
expression techniques as we did in
Section IV-A.
Although the two methods cannot help much if the
site's content has been stored by a search engine or
any third party, to the least extent, they are still
helpful in preventing further privacy leakage caused
by current available information on the sites.
5.2 Personal Information Filtering and Protection
For future disaster response web services, we
suggest that such services should adopt certain
mechanisms for personal information filtering and
protection to avoid privacy leakage.
To facilitate personal information filtering, the
system should be able to recognize all possible
personal names and addresses, which we consider
feasible as personal names can be checked by using a
name dictionary and Internet map services can be
used for address checking. When a user tends to
post comments containing personal names or
addresses, the system can warn him/her regarding the
possible risks that may be caused by the current
operation.
The system can also mask out significant parts of
the names and addresses, such as first name and
the street and finer levels of address information.
If a reader needs to access the masked information,
the system could provide certain mechanisms to grant
the access. For example, the interested reader
needs to authenticate him/herself with a SMS verification code sent from a
certificated emergency response organization.
To protect personal contact information from being
disclosed, disaster
response systems may provide "forwarding"
mechanisms. For example, if a user would like to
leave his mobile phone number in his comments, the
system can "transform" the phone number into an
extension number which is visible to the public. By
doing so, a reader of the comments must dial the
system's representative phone number and then the extension number to reach the comment poster without
knowing his actual number. The system has to pay
for the communication cost on behalf of the callers
for their phone conversations with the comment
poster, which could easily be minimized by limiting
the duration for each forwarded call9. In addition, the system should adopt
certain authentication mechanisms to prevent Sybil
attacks that malicious users register a large number
of fake mobile phone numbers as the system's
extension numbers and misuse them in some other ways.
By adopting both personal information filtering and
protection mechanisms, we believe that the privacy
leakage on crisis response web services can be
mitigated to a reasonable degree.
6 Conclusion and Future Work
In this paper, we have identified the privacy risks
which are caused by users' communication on disaster
response websites. Based on an analysis of the MKER
forum, we have shown that the revelation of personal
information and the resulting privacy leakage on
such services is indeed a serious issue. We have
also proposed several means to prevent such privacy
leakage on current and future disaster response
services respectively. In the future, we plan to
implement the proposed mechanisms for personal
information filtering and protection (c.f.,
Section V-B) and make them publicly
available to disaster response service providers in
the hope to resolve the Privacy Crisis due to Crisis Response on the Web.
References
[1]
L. Palen and S. B. Liu, "Citizen communications in crisis: anticipating a
future of ICT-supported public participation," in Proc. CHI
'07. ACM, pp. 727-736.
[2]
S. Vieweg, L. Palen, S. B. Liu, A. L. Hughes, and J. Sutton, "Collective
intelligence in disaster collective intelligence in disaster: Examination of
the phenomenon in the aftermath of the 2007 Virginia Tech shooting,"
2008.
[3]
J. Sutton, L. Palen, and I. Shklovski, "Backchannels on the front lines:
Emergent uses of social media in the 2007 southern california wildfires," in
Proc. ISCRAM '08.
[4]
L. Palen, S. Vieweg, J. Sutton, S. B. Liu, and A. Hughes, "Crisis informatics:
Studying crisis in a networked world," in Proc. e-SS '07.
[5]
F. D. L. Wigand, "Twitter takes wing in government: diffusion, roles, and
management," in Proc. dg.o '10. Digital Government Society of North America, 2010, pp. 66-71.
[6]
J. G. Taylor, S. C. Gillette, and F. C. S. C. (U.S.),
Communicating with wildland interface
communities during wildfire. USGS
Fort Collins Science Center, 2005.
[7]
SAHANA. [Online]. Available: http://sahanafoundation.org/[8]
RESCUE project. [Online]. Available:
http://www.itr-rescue.org/index.php[9]
RESCUE Disaster Portal. [Online]. Available:
http://www.disasterportal.org/[10]
L. Palen and S. Vieweg, "The emergence of online widescale interaction in
unexpected events: assistance, alliance & retreat," in Proc. CSCW
'08. ACM, 2008, pp. 117-126.
[11]
S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, "Microblogging during two
natural hazards events: what Twitter may contribute to situational
awareness," in Proc. CHI '10.
ACM, 2010, pp. 1079-1088.
[12]
K. Starbird, L. Palen, A. L. Hughes, and S. Vieweg, "Chatter on the red: what
hazards threat reveals about the social life of microblogged information,"
in Proc. CSCW '10. ACM, 2010,
pp. 241-250.
[13]
I. Shklovski, L. Palen, and J. Sutton, "Finding community through information
and communication technology in disaster response," in Proc. CSCW
'08. ACM, 2008, pp. 127-136.
[14]
Y. Qu, P. F. Wu, and X. Wang, "Online community response to major disaster: A
study of Tianya forum in the 2008 Sichuan earthquake," Hawaii
International Conference on System Sciences, pp. 1-11, 2009.
[15]
C. Torrey, M. Burke, M. Lee, A. Dey, S. Fussell, and S. Kiesler, "Connected
giving: Ordinary people coordinating disaster relief on the Internet,"
Hawaii International Conference on System Sciences, p. 179a, 2007.
[16]
R. Herold, "Addressing privacy issues during disaster recovery,"
Information Systems Security, vol. 14, no. 6, pp. 16-22, 2006.
[17]
"Historical typhoon database provided by central weather bureau, Taiwan,"
http://rdc28.cwb.gov.tw/data.php.[18]
"Extreme events and disasters from typhoon Morakot - the biggest threat ever
to Taiwan," Environmental Protection Administration, Executive Yuan, ROC,
Tech. Rep., 2009.