Vehicle detection with visual sensors like lidar and camera is one of the critical functions enabling autonomous driving. While they generate fine-grained point clouds or high-resolution images with rich information in good weather conditions, they fail in adverse weather (e.g., fog) where opaque particles distort lights and significantly reduce visibility. Thus, existing methods relying on lidar or camera experience significant performance degradation in rare but critical adverse weather conditions. To remedy this, we resort to exploiting complementary radar, which is less impacted by adverse weather and becomes prevalent on vehicles. In this paper, we present Multimodal Vehicle Detection Network (MVDNet), a two-stage deep fusion detector, which first generates proposals from two sensors and then fuses region-wise features between multimodal sensor streams to improve final detection results. To evaluate MVDNet, we create a procedurally generated training dataset based on the collected raw lidar and radar signals from the open-source Oxford Radar Robotcar. We show that the proposed MVDNet surpasses other state-of-the-art methods, notably in terms of Average Precision (AP), especially in adverse weather conditions. The code and data are available at https://github.com/qiank10/MVDNet.