Heterogeneous sensor data fusion is a challenging field that has gathered significant interest in recent years. Two of these challenges are learning from data with missing values, and finding shared representations for multimodal data to improve inference and prediction. In this paper, we propose amultimodal data fusion framework, the deep multimodal encoder (DME), based on deep learning techniques for sensor data compression, missing data imputation, and new modality prediction under multimodal scenarios. While traditional methods capture only the intramodal correlations, DME is able to mine both the intramodal correlations in the initial layers and the enhanced intermodal correlations in the deeper layers. In this way, the statistical structure of sensor data may be better exploited for data compression. By incorporating our new objective function, DME shows remarkable ability for missing data imputation tasks in sensor data. The shared multimodal representation learned by DME may be used directly for predicting new modalities. In experiments with a real-world dataset collected from a 40-node agriculture sensor network which contains three modalities, DME can achieve a root mean square error (RMSE) of missing data imputation which is only 20% of the traditional methods like K-nearest neighbors and sparse principal component analysis and the performance is robust to different missing rates. It can also reconstruct temperature modality from humidity and illuminance with an RMSE of 7 °C, directly from a highly compressed (2.1%) shared representation that was learned from incomplete (80% missing) data.