Sound-Dr: Reliable Sound Dataset and Baseline Artificial Intelligence System for Respiratory Illnesses



Published Sep 4, 2023
Van Truong Hoang Quang Nguyen Quoc Cuong Nguyen Xuan Phong Nguyen Hoang Nguyen


As the burden of respiratory diseases continues to fall on society worldwide, this paper proposes a high-quality and reliable dataset of human sounds for studying respiratory illnesses, including pneumonia and COVID-19. It consists of coughing, mouth breathing, and nose breathing sounds together with metadata on related clinical characteristics. We also develop a proof-of-concept system for establishing baselines and benchmarking against multiple datasets, such as Coswara and COUGHVID. Our comprehensive experiments
show that the Sound-Dr dataset has richer features, better performance, and is more robust to dataset shifts in various machine learning tasks. It is promising for a wide range of real-time applications on mobile devices. The proposed dataset and system will serve as practical tools to support healthcare professionals in diagnosing respiratory disorders. The dataset and code are publicly available here:

Abstract 277 | PDF Downloads 208



Deep Learning, Anomaly, Respiratory, Baseline, Healthcare

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In The 25th acm sigkdd international conference on knowledge discovery and data mining (p. 2623–2631). doi: 10.1145/3292500 .3330701

Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Freitag, M., Pugachevskiy, S., . . . Schuller, B. (2017, August). Snore sound classification using imagebased deep spectrum features. In International speech communication association (interspeech) (pp. 3512– 3516).

Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145-1159. doi: 10.1016/ S0031-3203(96)00142-2

Brown, C., Chauhan, J., Grammenos, A., Han, J., Hasthanasombat, A., Spathis, D., . . . Mascolo, C. (2020). Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data. In Proceedings of the 26th acm sigkdd international conference on knowledge discovery and data mining (p. 3474–3484). doi: 10.1145/3394486.3412865

Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Acm sigkdd international conference on knowledge discovery and data mining (p. 785–794). doi: 10.1145/2939672.2939785

Deb, S., Warule, P., Nair, A., Sultan, H., Dash, R., & Krajewski, J. (2022, 9). Detection of common cold from speech signals using deep neural network. Circuits, Systems, and Signal Processing. doi: 10.1007/s00034-022-02189-y

Eyben, F., Wollmer, M., & Schuller, B. (2010, 01). opensmile – the munich versatile and fast open-source audio feature extractor. ACM Multimedia 2010 International Conference, 1459-1462. doi: 10.1145/1873951.1874246

FPT. (1999). Fpt softwave company limited. Retrieved from ([Online; accessed 30-04-2023])

Friedman, J. (2000, 11). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29. doi: 10.1214/aos/1013203451

Hoang, T., Pham, L., Ngo, D., & Nguyen, H. D. (2022). A cough-based deep learning framework for detecting covid-19. In The 44th ieee engineering in medicine and biology society (embc) (p. 3422-3425). doi: 10.1109/ EMBC48229.2022.9871179

Islam, R., Abdel-Raheem, E., & Tarique, M. (2022). A novel pathological voice identification technique through simulated cochlear implant processing systems. Applied Sciences, 12(5). doi: 10.3390/app12052398

Kreiman, J., Gerratt, B. R., & Precoda, K. (1990). Listener experience and perception of voice quality. Journal of Speech, Language, and Hearing Research, 33(1), 103– 115.

Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. In Eighth ieee international conference on data mining (p. 413-422). doi: 10.1109/ICDM.2008.17

McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference (Vol. 8, pp. 18–25).

Mo, A., Gui, E., & Fletcher, R. R. (2022). Use of voluntary cough sounds and deep learning for pulmonary disease screening in low-resource areas. In Ieee global humanitarian technology conference (ghtc) (p. 242-249). doi: 10.1109/GHTC55712.2022.9911027

NYU Breathing Sounds for COVID-19. (2020). https:// ([Online; accessed 30-March-2023])

Orlandic, L., Teijeiro, T., & Atienza, D. (2021). The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms. Springer Science and Business Media LLC, 8(1). doi: 10.1038/s41597-021-00937-4

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Peplinski, J., Shor, J., Joglekar, S., Garrison, J., & Patel, S. (2021). FRILL: A Non-Semantic Speech Embedding for Mobile Devices. In International speech communication association (interspeech) (pp. 1204–1208). doi: 10.21437/Interspeech.2021-2070

Pham, L., Ngo, D., Tran, K., Hoang, T., Schindler, A., & McLoughlin, I. (2022). An Ensemble of Deep Learning Frameworks for Predicting Respiratory Anomalies. In Ieee engineering in medicine & biology society (embc) (p. 4595-4598). doi: 10.1109/EMBC48229 .2022.9871440

Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (2009). Dataset shift in machine learning. The MIT Press.

Rabanser, S., Gunnemann, S., & Lipton, Z. C. (2019). Failing loudly: An empirical study of methods for detecting dataset shift. Curran Associates Inc.

Rocha, B. M., Filos, D., Mendes, L., Serbes, G., Ulukaya, S., Kahya, Y. P., . . . de Carvalho, P. (2019, mar). An open access database for the evaluation of respiratory sound classification algorithms. Physiological Measurement, 40(3), 035001. doi: 10.1088/1361-6579/ab03ea

Sakkatos, P., Barney, A., Bruton, A., Haitchi, H. M., Kurukulaaratchy, R. J., & Thackray, D. (2019). Quantified breathing patterns can be used as a physiological marker to monitor asthma. European Respiratory Journal, 54(suppl 63). doi: 10.1183/13993003.congress -2019.PA5038

Sasaki, Y. (2007, 01). The truth of the f-measure. Teach Tutor Mater.

Schmitt, M., & Schuller, B. (2017). openxbow – introducing the passau open-source crossmodal bag-ofwords toolkit. Journal of Machine Learning Research, 18(96), 1–5.

Sharma, N., Krishnan, P., Kumar, R., Ramoji, S., Chetupalli, S. R., R., N., . . . Ganapathy, S. (2020). Coswara — A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis. In International speech communication association (interspeech) (pp. 4811–4815). doi: 10.21437/Interspeech.2020-2768

Sharma, N. K., Chetupalli, S. R., Bhattacharya, D., Dutta, D., Mote, P., & Ganapathy, S. (2022). The second dicova challenge: Dataset and performance analysis for diagnosis of covid-19 using acoustics. In Ieee international conference on acoustics, speech and signal processing (icassp) (p. 556-560). doi: 10.1109/ ICASSP43922.2022.9747188

Shor, J., Jansen, A., Maor, R., Lang, O., Tuval, O., de Chaumont Quitry, F., . . . Haviv, Y. (2020). Towards Learning a Universal Non-Semantic Representation of Speech. In International speech communication association (interspeech) (pp. 140–144). doi: 10.21437/ Interspeech.2020-1242

Song, I. (2015). Diagnosis of pneumonia from sounds collected using low cost cell phones. In International joint conference on neural networks (ijcnn) (p. 1-8). doi: 10.1109/IJCNN.2015.7280317

Woolcock institute of medical research vietnam. (1981). Retrieved from https://www.woolcockvietnam .org ([Online; accessed 30-04-2023])

Yang, Y., Yuan, Y., Zhang, G., Hao Wang, Y.-C. C., Liu, Y., Tarolli, C. G., . . . Katabi, D. (2022, august). Artificial intelligence-enabled detection and assessment of parkinson’s disease using nocturnal breathing signals. Nature Medicine. doi: 10.1038/s41591-022-01932-x

Zhao, Y., & Hryniewicki, M. K. (2018). Xgbod: Improving supervised outlier detection with unsupervised representation learning. In International joint conference on neural networks (ijcnn) (p. 1-8). doi: 10.1109/ IJCNN.2018.8489605

Zhao, Y., Nasrullah, Z., & Li, Z. (2019). Pyod: A python toolbox for scalable outlier detection. Journal of Machine Learning Research, 20(96), 1-7.
Regular Session Papers