In this paper, we compare the performance of four models in a retrieval based question answering dialogue task on two moderately sized corpora (~ 10,000 utterances). One model is a statistical model and uses cross language relevance while the others are deep neural networks utilizing the BERT architecture along with different retrieval methods. The statistical model has previously outperformed LSTM based neural networks in a similar task whereas BERT has been proven to perform well on a variety of NLP tasks, achieving state-of-the-art results in many of them. Results show that the statistical cross language relevance model outperforms the BERT based architectures in learning question-answer mappings. BERT achieves better results by mapping new questions to existing questions.