Data Quality is a cross-disciplinary and often domain specific problem due to the importance of fitness for use in the definition of data quality metrics. It has been the target of research and development for over 4 decades by business analysts, solution architects, database experts and statisticians to name a few. However, the changing landscape of data quality challenges indicate the need for holistic solutions. As a first step towards bridging any gaps between the various research communities, we undertook a comprehensive literature study of data quality research published in the last two decades. In this study we considered a broad range of Information System (IS) and Computer Science (CS) publication (conference and journal) outlets. The main aims of the study were to understand the current landscape of data quality research, to create better awareness of (lack of) synergies between various research communities, and, subsequently, to direct attention towards holistic solutions. In this paper, we present a summary of the findings from the study, that include a taxonomy of data quality problems, identification of the top themes, outlets and main trends in data quality research, as well as a detailed thematic analysis that outlines the overlaps and distinctions between the focus of IS and CS publications. |