Heterogeneous sensor data fusion is a challenging field that has gathered significant interest in recent years. In this paper, we propose a neural network-based multimodal data fusion framework named deep multimodal encoder (DME). Through our new objective function, both the intra- and inter-modal correlations of multimodal sensor data can be better exploited for recovering the missing values, and the shared representation learned can be used directly for prediction tasks. In experiments with real-world sensor data, DME shows remarkable ability for missing data imputation and new modality prediction. Compared with traditional algorithms such as kNN and Sparse-PCA, DME is more expressive, robust, and scalable to large datasets.