Deepfake Detection in Video and Audio Clips: A Comprehensive Survey and Analysis
Main Article Content
Abstract
Deepfake (DF) technology has emerged as a major concern due to its potential for misuse, including privacy violations, misinformation, and threats to the integrity of digital media. While significant progress has been made in developing deep learning (DL) algorithms to detect DFs, effectively distinguishing between real and manipulated content remains a challenge due to the rapid evolution of DF generation techniques. This study aims to address two key issues: the need for a comprehensive review of current DF detection methods and the challenge of achieving high detection accuracy with low computational cost. We conducted a systematic literature review to evaluate various DF detection algorithms, focusing on their performance, computational efficiency, and robustness. The review covers methods such as Convolutional Neural Networks (CNNs), Long Short Term Memory (LSTM) networks, hybrid models, and specialized approaches like spectral and phonetic analysis. Our findings reveal that while some methods achieve high accuracy, up to 94% in controlled environments, they often struggle to generalize across diverse DF applications. Hybrid models that combine CNNs and LSTMs typically offer a better balance between accuracy and computational efficiency. This paper provides valuable insights into the current state of DF detection and highlights the need for adaptive models that can effectively address the evolving challenges of DF generation.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega-Garcia, “Deepfakes and beyond: A survey of face manipulation and fake detection,” Inf. Fusion, vol. 64, pp. 131–148, 2020.
L. Floridi, “What the near future of artificial intelligence could be,” 2019 Yearb. Digit. Ethics Lab, pp. 127–142, 2020.
M. Westerlund, “The emergence of deepfake technology: A review,” Technol. Innov. Manag. Rev., vol. 9, no. 11, 2019.
T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
Y. Mirsky and W. Lee, “The creation and detection of deepfakes: A survey,” ACM Comput. Surv., vol. 54, no. 1, pp. 1–41, 2021.
R. Chesney and D. Citron, “Deepfakes and the new disinformation war: The coming age of post-truth geopolitics,” Foreign Aff., vol. 98, p. 147, 2019.
B. Dolhansky, “The dee pfake detection challenge (DFDC) pre view dataset,” arXiv Prepr. arXiv1910.08854, 2019.
S. Salman and J. H. Soud, “Deep Learning Machine using Hierarchical Cluster Features,” Al-Mustansiriyah J. Sci., vol. 29, no. 3, pp. 82–93, 2018.
J. Rohaniyah and S. Rijal, “Utilizing Faceapp Application as Media in Teaching Speaking (Practical Ideas and Implementation Guidelines for Speaking Class),” English Teach. J. A J. English Lit. Lang. Educ., vol. 8, no. 2, pp. 67–87, 2020.
T. Sathish, T. S. Abinaya, B. Anupriya, and L. Uma, “Manual fakeapp detection using sentimental analysis through webpage,” Semant. Sch., pp. 208–221, 2018.
S. Feldstein, “How artificial intelligence systems could threaten democracy,” Conversat., 2019.
G. Al-Kateb, I. Khaleel, and M. Aljanabi, “CryptoGenSec: A Hybrid Generative AI Algorithm for Dynamic Cryptographic Cyber Defence,” Mesopotamian J. CyberSecurity, vol. 4, no. 3, pp. 22–35, 2024.
B. A. Jaafar, M. T. Gaata, and M. N. Jasim, “Home appliances recommendation system based on weather information using combined modified k-means and elbow algorithms,” Indones. J. Electr. Eng. Comput. Sci., vol. 19, no. 3, pp. 1635–1642, 2020.
H. B. Dixon Jr, “Deepfakes: More frightening than photoshop on steroids,” Judges J., vol. 58, p. 35, 2019.
S. A. H. Alazawi and J. H. Al-A’meri, “Face Feature Recognition System Considering Central Moments,” Int. J. Comput. Eng. Res, vol. 3, no. 1, pp. 52–57, 2013.
B. Chesney and D. Citron, “Deep fakes: A looming challenge for privacy, democracy, and national security,” Calif. L. Rev., vol. 107, p. 1753, 2019.
M. T. Jafar, M. Ababneh, M. Al-Zoube, and A. Elhassan, “Forensics and analysis of deepfake videos,” in 2020 11th international conference on information and communication systems (ICICS), 2020, pp. 53–58.
M. Subhi, O. F. Rashid, S. A. Abdulsahib, M. K. Hussein, and S. M. Mohammed, “Anomaly Intrusion Detection Method based on RNA Encoding and ResNet50 Model,” Mesopotamian J. CyberSecurity, vol. 4, no. 2, pp. 120–128, 2024.
K. Yamamoto et al., “2020 AEIT International Conference of Electrical and Electronic Technologies for Automotive (AEIT AUTOMOTIVE),” in International Conference of Electrical and Electronic Technologies for Automotive (AEIT AUTOMOTIVE), 2020, pp. 1–6.
A. Saihood, M. A. Al-Shaher, and M. A. Fadhel, “A New Tiger Beetle Algorithm for Cybersecurity, Medical Image Segmentation and Other Global Problems Optimization,” Mesopotamian J. CyberSecurity, vol. 4, no. 1, pp. 17–46, 2024.
L. Trinh, M. Tsang, S. Rambhatla, and Y. Liu, “Interpretable and trustworthy deepfake detection via dynamic prototypes,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 1973–1983.
M. A. Younus and T. M. Hasan, “Effective and fast deepfake detection method based on haar wavelet transform,” in 2020 International Conference on Computer Science and Software Engineering (CSASE), 2020, pp. 186–190.
O. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” in BMVC 2015-Proceedings of the British Machine Vision Conference 2015, 2015.
P. Korshunov et al., “Tampered speaker inconsistency detection with phonetically aware audio-visual features,” in International conference on machine learning, 2019.
B.-S. Lin, D.-W. Hsu, C.-H. Shen, and H.-F. Hsiao, “Using fully connected and convolutional net for GAN-based face swapping,” in 2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2020, pp. 185–188.
S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Synthesizing obama: learning lip sync from audio,” ACM Trans. Graph., vol. 36, no. 4, pp. 1–13, 2017.
J. Son Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip reading sentences in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6447–6456.
P. Korshunov and S. Marcel, “Speaker inconsistency detection in tampered video,” in 2018 26th European signal processing conference (EUSIPCO), 2018, pp. 2375–2379.
J. Galbally and S. Marcel, “Face anti-spoofing based on general image quality assessment,” in 2014 22nd international conference on pattern recognition, 2014, pp. 1173–1178.
A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1–11.
D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, “Mesonet: a compact facial video forgery detection network,” in 2018 IEEE international workshop on information forensics and security (WIFS), 2018, pp. 1–7.
D. Güera and E. J. Delp, “Deepfake video detection using recurrent neural networks,” in 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS), 2018, pp. 1–6.
F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815–823.
N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, P. Bestagini, and S. Tubaro, “Video face manipulation detection through ensemble of cnns,” in 2020 25th international conference on pattern recognition (ICPR), 2021, pp. 5012–5019.
M. Suresha, S. Kuppa, and D. S. Raghukumar, “A study on deep learning spatiotemporal models and feature extraction techniques for video understanding,” Int. J. Multimed. Inf. Retr., vol. 9, no. 2, pp. 81–101, 2020.
U. A. Ciftci, I. Demir, and L. Yin, “Fakecatcher: Detection of synthetic portrait videos using biological signals,” IEEE Trans. Pattern Anal. Mach. Intell., 2020.
H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu, “Multi-attentional deepfake detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2185–2194.
U. A. Ciftci, I. Demir, and L. Yin, “Fakecatcher: detection of synthetic portrait videos using biological signals,” 2023, Google Patents.
E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and P. Natarajan, “Recurrent convolutional strategies for face manipulation detection in videos,” Interfaces (GUI), vol. 3, no. 1, pp. 80–87, 2019.
D. Salvi et al., “A robust approach to multimodal deepfake detection,” J. Imaging, vol. 9, no. 6, p. 122, 2023.
S. Kumar, “Intelligent Bearing Fault Diagnosis and Classification based on Support Vector Machine,” in 2021 2nd Global Conference for Advancement in Technology (GCAT), 2021, pp. 1–6.
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
S. Kumar, “Intelligent bearing fault diagnosis and classification based on support vector machine,” in 2021 2nd Global Conference for Ad- vancement in Technology (GCAT) (IEEE, 2021) pp. 1–6.
D. Salvi, H. Liu, S. Mandelli, P. Bestagini, W. Zhou, W. Zhang, and S. Tubaro, “A robust approach to multimodal DF detection,” Journal of Imaging 9, 122 (2023).
D. Zhang, C. Li, F. Lin, D. Zeng, and S. Ge, “Detecting DF videos with temporal dropout 3dcnn.” in IJCAI (2021) pp. 1288–1294.
T. Jung, S. Kim, and K. Kim, “Deepvision: DFs detection using human eye blinking pattern,” IEEE Access 8, 83144–83154 (2020).
V. Abdul Jamsheed and B. Janet, “Deep fake video detection using recurrent neural networks,” International Journal of Scientific Research in Computer Science and Engineering 9, 22–26 (2021).
X. Li, Y. Lang, Y. Chen, X. Mao, Y. He, S. Wang, H. Xue, and Q. Lu, “Sharp multiple instance learning for DF video detection,” in
Proceedings of the 28th ACM international conference on multimedia (2020) pp. 1864–1872.
Y. Li and S. Lyu, “Exposing DF videos by detecting face warping artifacts,” arXiv preprint arXiv:1811.00656 (2018).
H. H. Nguyen, J. Yamagishi, and I. Echizen, “Capsule-forensics: Using capsule networks to detect forged images and videos,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2019) pp. 2307–2311
T. Jung, S. Kim, and K. Kim, “Deepvision: DFs detection using human eye blinking pattern,” IEEE Access 8, 83144–83154 (2020).
V. Abdul Jamsheed and B. Janet, “Deep fake video detection using recurrent neural networks,” International Journal of Scientific Research in Computer Science and Engineering 9, 22–26 (2021).
X. H. Nguyen, T. S. Tran, K. D. Nguyen, D.-T. Truong, et al., “Learning spatio-temporal features to detect manipulated facial videos created by the DF techniques,” Forensic Science International: Digital Investigation 36, 301108 (2021).
P. Yadav, I. Jaswal, J. Maravi, V. Choudhary, and G. Khanna, “DF detection using inceptionresnetv2 and lstm,” in International Conference on Emerging Technologies: AI, IoT, and CPS for Science Technology Applications (2021).
Y.-J. Heo, W.-H. Yeo, and B.-G. Kim, “DF detection algorithm based on improved vision transformer,” Applied Intelligence 53, 7512– 7527 (2023).
W. J. Hadi, S. M. Kadhem, and A. R. Abbas, “Unmasking DFs based on deep learning and noise residuals,” IRAQI JOURNAL OF COMPUTERS, COMMUNICATIONS, CONTROL AND SYSTEMS ENGINEERING 22 (2022).
Y. Li, M.-C. Chang, and S. Lyu, “In ictu oculi: Exposing ai created fake videos by detecting eye blinking,” in 2018 IEEE International workshop on information forensics and security (WIFS) (IEEE, 2018) pp. 1–7.
J. B. Awotunde, R. G. Jimoh, A. L. Imoize, A. T. Abdulrazaq, C.-T. Li, and C.-C. Lee, “An enhanced deep learning-based DF video detection and classification system,” Electronics 12, 87 (2022).
S. Suratkar and F. Kazi, “Deep fake video detection using transfer learning approach,” Arabian Journal for Science and Engineering 48, 9727–9737 (2023).
U. Kosarkar, G. Sarkarkar, and S. Gedam, “Revealing and classification of DFs video’s images using a customize convolution neural network model,” Procedia Computer Science 218, 2636–2652 (2023).
D. Yadav and S. Salmani, “DF: A survey on facial forgery technique using generative adversarial network,” in 2019 International conference on intelligent computing and control systems (ICCS) (IEEE, 2019) pp. 852–857.
S. Suratkar, E. Johnson, K. Variyambat, M. Panchal, and F. Kazi, “Employing transfer-learning based cnn architectures to enhance the generalizability of DF detection,” in 2020 11th international conference on computing, communication and networking technologies (ICCCNT) (IEEE, 2020) pp. 1–9.
J. Mallet, R. Dave, N. Seliya, and M. Vanamala, “Using deep learning to detecting DFs,” in 2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI) (IEEE, 2022) pp. 1–5.
S. Lyu, “DF detection: Current challenges and next steps,” in 2020 IEEE international conference on multimedia & expo workshops (ICMEW) (IEEE, 2020) pp. 1–6.
T. Sakirin and S. Kusuma, “A Survey of Generative Artificial Intelligence Techniques”, Babylonian Journal of Artificial Intelligence, vol. 2023, pp. 10–14, Mar. 2023.
Z. Khanjani, G. Watson, and V. P. Janeja, “Audio DFs: A survey,” Frontiers in Big Data 5, 1001063 (2023).
K. T. Mai, S. Bray, T. Davies, and L. D. Griffin, “Warning: humans cannot reliably detect speech DFs,” Plos one 18, e0285333 (2023).
M. Pawelec, “DFs and democracy (theory): how synthetic audio-visual media for disinformation and hate speech threaten core demo- cratic functions,” Digital society 1, 19 (2022).
N. C. Köbis, B. Doležalová, and I. Soraperra, “Fooled twice: People cannot detect DFs but think they can,” Iscience 24 (2021).
A. Eberl, J. Kühn, and T. Wolbring, “Using DFs for experiments in the social sciences-a pilot study,” Frontiers in Sociology 7, 907199 (2022).
R. A. M. Reimao, “Synthetic speech detection using deep neural networks,” (2019).
A. Godulla, C. P. Hoffmann, and D. Seibert, “Dealing with DFs–an interdisciplinary examination of the state of research and implications for communication studies,” SCM Studies in Communication and Media 10, 72–96 (2021).
O. M. Hammad, I. Smaoui, A. Fakhfakh, and M. M. Hashim, “Recent advances in digital image masking techniques Future challenges and trends: a review”, SHIFRA, vol. 2024, pp. 67–73, May 2024, doi: 10.70470/SHIFRA/2024/008.
N. Diakopoulos and D. Johnson, “Anticipating and addressing the ethical implications of DFs in the context of elections,” New Media & Society 23, 2072–2098 (2021).
Y. Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y. Liu, “Fastspeech 2: Fast and high-quality end-to-end text to speech,” arXiv preprint arXiv:2006.04558 (2020).
J. Khochare, C. Joshi, B. Yenarkar, S. Suratkar, and F. Kazi, “A deep learning framework for audio DF detection,” Arabian Journal for Science and Engineering , 1–12 (2021).
M. Lataifeh, A. Elnagar, I. Shahin, and A. B. Nassif, “Arabic audio clips: Identification and discrimination of authentic cantillations from imitations,” Neurocomputing 418, 162–177 (2020).
Z. Almutairi and H. Elgibreen, “A review of modern audio DF detection methods: challenges and future directions,” Algorithms 15, 155 (2022).
Y. Zhou and S.-N. Lim, “Joint audio-visual DF detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021) pp. 14800–14809.
J. Frank and L. Schönherr, “Wavefake: A data set to facilitate audio DF detection,” arXiv preprint arXiv:2111.02813 (2021).
H. Khalid, M. Kim, S. Tariq, and S. S. Woo, “Evaluation of an audio-video multimodal DF dataset using unimodal and multimodal detectors,” in Proceedings of the 1st workshop on synthetic multimedia-audiovisual DF generation and detection (2021) pp. 7–15.
M. Alzantot, Z. Wang, and M. B. Srivastava, “Deep residual neural networks for audio spoofing detection,” arXiv preprint arXiv:1907.00501 (2019).
H. Yu, Z.-H. Tan, Z. Ma, R. Martin, and J. Guo, “Spoofing detection in automatic speaker verification systems using dnn classifiers and dynamic acoustic features,” IEEE transactions on neural networks and learning systems 29, 4633–4644 (2017).
T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “Emotions don’t lie: An audio-visual DF detection method using affective cues,” in Proceedings of the 28th ACM international conference on multimedia (2020) pp. 2823–2832.
H. Khalid, S. Tariq, M. Kim, and S. S. Woo, “Fakeavceleb: A novel audio-video multimodal DF dataset,” arXiv preprint arXiv:2108.05080 (2021).
D. M. Ballesteros, Y. Rodriguez-Ortega, D. Renza, and G. Arce, “Deep4snet: deep learning for fake speech classification,” Expert Systems with Applications 184, 115465 (2021).
P. Kawa, M. Plata, and P. Syga, “Defense against adversarial attacks on audio DF detection,” arXiv preprint arXiv:2212.14597 (2022).
H.-s. Shin, J. Heo, J.-h. Kim, C.-y. Lim, W. Kim, and H.-J. Yu, “Hm-conformer: A conformer-based audio DF detection system with hierarchical pooling and multi-level classification token aggregation methods,” arXiv preprint arXiv:2309.08208 (2023).
Z. Cai, W. Wang, and M. Li, “Waveform boundary detection for partially spoofed audio,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2023) pp. 1–5.
M. Mcuba, A. Singh, R. A. Ikuesan, and H. Venter, “The effect of deep learning methods on DF audio detection for digital investigation,” Procedia Computer Science 219, 211–219 (2023).
S.-Y. Lim, D.-K. Chae, and S.-C. Lee, “Detecting DF voice using explainable deep learning techniques,” Applied Sciences 12, 3926 (2022).
L. Yan, S. Yin-He, Y. Qian, S. Zhi-Yu, W. Chun-Zi, and L. Zi-Yun, “Method of reaching consensus on probability of food safety based on the integration of finite credible data on block chain,” IEEE access 9, 123764–123776 (2021).
M. Masood, M. Nawaz, K. M. Malik, A. Javed, A. Irtaza, and H. Malik, “DFs generation and detection: State-of-the-art, open chal- lenges, countermeasures, and way forward,” Applied intelligence 53, 3974–4026 (2023).
H. Cheng, Y. Guo, T. Wang, Q. Li, X. Chang, and L. Nie, “Voice-face homogeneity tells DF,” ACM Transactions on Multimedia Computing, Communications and Applications 20, 1–22 (2023).
V. M. M. G. M. G. Yash Doke, Prajwalita Dongare, “Deep fake video detection using deep learning,” Journal homepage 3, 540–544, (2022).
S. Ansari, “Practical example: Face recognition,” in Building Computer Vision Applications Using Artificial Neural Networks: With Examples in OpenCV and TensorFlow with Python (Springer, 2023) pp. 401–428.
K. K. Babu and S. R. Dubey, “Csgan: Cyclic-synthesized generative adversarial networks for image-to-image transformation,” Expert Systems with Applications 169, 114431 (2021).
T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196 (2017).
H. Liu, X. Li, W. Zhou, Y. Chen, Y. He, H. Xue, W. Zhang, and N. Yu, “Spatial-phase shallow learning: rethinking face forgery detection in frequency domain,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021) pp. 772–781.
L. Hussain, “Fortifying AI Against Cyber Threats Advancing Resilient Systems to Combat Adversarial Attacks”, EDRAAK, vol. 2024, pp. 26–31, Mar. 2024, doi: 10.70470/EDRAAK/2024/004.