The empirical mode decomposition (EMD) is a nonlinear method that is truly adaptive with good localization property in the time domain for analyzing non-stationary complex data. The EMD has been proven useful in a wide range of applications. However, due to the nonlinear and complex nature of the sifting process, the most essential step of the EMD, a firm mathematical foundation or a transparent physical description are still lacked for EMD. Here, we embark on constructing a mathematical theory of the sifting operator. We first show that the sifting operator can be expressed as the data plus the sum of the responses to the impulses (multiplied by the data value) at the extrema. Such an expression of the sifting operator is then used to investigate the adaptive nature and the localizing effect of the EMD. Alternatively, the sifting operator can also be represented by a sifting matrix, which depends nonlinearly on the extrema distribution. Based on the eigen-decomposition of the sifting matrix, the transfer function of the sifting process is analyzed. Finally we answer what an intrinsic mode function (IMF) is from the wave perspective by exploring the physical basis of the IMFs.