Page 106 - My FlipBook

P. 106

Collaborative Projects所
內
合
作
計
畫

In our preliminary experiments (Figure 2), we found that we can still correctly recover the le system from
a crash without maintaining a journal (e.g., using the ext2 instead of ext4 le system) when we used crash-
deterministic SSDs. Moreover, for the database layer, the number of fsync operations (which cleans the
temporary state maintained by the current le system and triggers an SSD sync action) required to perform a
single transaction can be reduced from 5 to 1 when crash-deterministic SSDs are used.
The main drawback of employing a crash-deterministic SSD design is that it relies on complex data
structures and algorithms, potentially giving rise to implementation errors in the firmware layer. Even a
one-byte error in a storage device can disrupt the entire indexing mechanism and hence all stored data.
To ensure the correctness of our rmware implementations, we are utilizing formal veri cation techniques.
We are developing a system to check the crash-deterministic properties of our firmware implementation
by integrating several verification tools, including C-language verification tools (e.g., Rosette+Serval) and
theorem provers (e.g., Agda). This system will automatically check the C program written by firmware
developers and con rm that the program is crash-deterministic or, if it is not, it will nd a counterexample.

Figure 2 : The advantage of crash-deterministic SSD.

We assert that it is optimal to begin designing crash recovery mechanisms at the storage device layer. Both
MIT and the University of Washington have proposed crash recovery systems over the past two years. Their
crash recovery mechanisms are focused on the le system level. However, design complexity is much higher
at the le system level compared to at the rmware level, so adding a veri ed crash-recovery mechanism to
the le system level incurs higher veri cation costs. Taking the MIT le system DFSCQ as an example, it takes
around ten person-years for system implementation and to prove system correctness. Even worse, DFSCQ is
three times slower than the standard ext4 le system in terms of performance. If we use crash-deterministic
SSDs as the starting point for our crash-recovery mechanism, we expect that overall system performance will
not diminish, despite requiring formal veri cation of its correctness. In fact, system performance might even
be improved due to the removal of redundant crash recovery mechanisms (such as journaling) from other
system layers. Thus, overall implementation time can be reduced to 2~4 person-years because our formal
veri cation system has a higher degree of automation than that employed by MIT.

104

101 102 103 104 105 106 107 108 109 110 111