Benchmarking and Cross-Platform Evaluation of Public Deepfake Detection Models on Viral Real-World Media

📊 Research Infographic

🎨

Visual Summary

Figure 1A: Confusion matrix Mapping A — **Figure 1A.** Confusion matrix for Mapping A treating “Suspicious” as positive. Heatmap showing true positives (6), false negatives (4), true negatives (10), and false positives (0) for Deepware AI outputs on 20 viral videos. Accuracy = 80%, sensitivity = 60%, specificity = 100%, precision = 100%.

Figure 1B: Confusion matrix Mapping B — **Figure 1B.** Confusion matrix for Mapping B treating “Suspicious” as negative. Heatmap showing true positives (2), false negatives (8), true negatives (10), and false positives (0) for Deepware AI outputs. Accuracy = 60%, sensitivity = 20%, specificity = 100%, precision = 100%.

Figure 2: Per-video detection scores — **Figure 2.** Per-video detection scores for 10 deepfake samples. Grouped bar graph comparing Deepware AI ensemble scores (blue) and UB DeepFake-O-Meter mean detector scores (red). Highlights large variance across videos such as Biden, Trump, and Rashmika Mandanna.

Figure 3: Cross-platform agreement — **Figure 3.** Cross-platform agreement on deepfake detection. Scatter plot comparing Deepware AI ensemble scores (x-axis) to UB DeepFake-O-Meter mean scores (y-axis). Each point represents a video sample; labels identify subjects. Wide scatter indicates inconsistent cross-platform agreement.

Figure 4: Score distribution by manipulation type — **Figure 4.** Score distribution by manipulation type. Boxplot comparing detection scores for face-swap/identity-replacement deepfakes vs lip-synthesis/dubbing manipulations. Face-swap examples (e.g., Obama, Morgan Freeman) show higher median detection likelihoods.

📄 Abstract

Deepfakes pose serious risks to public trust and information integrity; we tested whether publicly available detection tools reliably identify viral real-world deepfakes. We hypothesized that off-the-shelf detectors would show inconsistent accuracy and produce both false positives and false negatives when applied to in-the-wild videos. To test this, we evaluated 20 viral clips (10 confirmed deepfakes, 10 authentic controls) using two public detection platforms and recorded ensemble and per-model likelihoods across more than ten detectors. Results revealed substantial cross-platform disagreement: one platform's ensemble flagged only a minority of confirmed deepfakes while the research platform produced extreme per-model score variance, so that sensitivity depended strongly on how an intermediate "Suspicious" label was treated. Depending on the binary mapping used, measured sensitivity varied widely while specificity remained high for this sample. We conclude that current public detectors provide useful signals but are not yet reliable as sole arbiters of authenticity for viral content; we recommend publishing full per-video numeric outputs, versioned model identifiers, and pairing automated screening with human expert review.

Methodology

Video Sample Selection

We compiled a convenience sample of 20 viral videos: 10 confirmed deepfakes and 10 authentic control videos featuring prominent political and entertainment figures. Deepfake items were drawn from documented repositories, academic demonstrations, BBC News segments, and viral media that had been publicly debunked by fact-checking organizations.

For full transparency: The exact titles and descriptions of all Deepfake and Authentic video samples used in this study are listed in the Video Sample Collection & Dataset section below. Each sample is available for download, and the lists match the dataset files used for all analyses. Please refer to these lists for precise sample documentation and replication.

👥 Notable Test Cases Include:

Political Figures: Barack Obama BBC News demonstration, Joe Biden's "pistachio story", Donald Trump LipSynthesis, Amit Shah reservation video
Celebrities: Morgan Freeman Singularity video, Anderson Cooper LipSynthesis, Bill Gates deepfake examples
Indian Entertainment: Aamir Khan & Ranveer Singh political endorsements, Rashmika Mandanna viral video
Technical Specifications: File sizes 5.7-34.4 MB, durations 48-242 seconds, resolutions 480×360 to 3840×2160 pixels

Detection Platforms

We employed two publicly available detection platforms:

Platform 1: Deepware AI Scanner (Beta)

Ensemble Architecture: Three component detectors with categorical thresholds:
- Deepfake Detected (>80%)
- Suspicious (50-80%)
- No Deepfake Detected (<50%)
Components: Avatarify, proprietary Deepware model, Seferbekov
Testing Period: April 2024 - September 2025

Platform 2: UB Media Forensics Lab's DeepFake-O-Meter

11 State-of-the-Art Models (2019-2025):
- Modern Models: AVSRDD (2025), LIPINC (2024), CFM (2025)
- Established Models: AVAD (2023), AltFreezing (2023), LSDA (2024)
- Traditional Models: DSP-FWA (2019), FTCN (2021), SBI (2022)
- Specialized Models: TALL (2023), WAV2LIP-STA (2022), XCLIP (2022)
Research Interface: Academic settings with detailed per-model likelihood scores

⚙️ Evaluation Approach

For each video, we recorded:

Ground-truth classification (Deepfake vs Authentic)
Platform ensemble scores and categorical labels
Individual component/model scores
Technical metadata (resolution, duration, encoding)
Scan timestamps and any analyst notes

📊 Binary Classification Mappings

Since Deepware returns three-level categorical output, we evaluated two operational mappings:

Mapping A (Lenient): Treat "Suspicious" as positive detection
Mapping B (Conservative): Only "Deepfake Detected" counts as positive

📏 Performance Metrics

We computed standard binary classification metrics: accuracy, sensitivity (recall), specificity, precision, and F1 score using confusion matrices derived from categorical counts.

Results

20% Deepfake Detection Rate (Deepware)

40% Suspicious Classifications

98.6% Max Score Variance (Cross-Platform)

8/10 AVSRDD Detection Success

0% False Positives (Authentic Videos)

11 AI Models Tested

Key Findings

Our analysis revealed significant limitations and inconsistencies in current public deepfake detection tools when applied to viral real-world content:

Benchmarking and Cross-Platform Evaluation of Public Deepfake Detection Models on Viral Real-World Media

Raw Data Results

Sample Fake Deepfake Clips used for testing:

Viral Deepfake Videos Thrive Of Aamir Khan & Ranveer Singh Endorsing Political Parties – Business today
Amit Shah Fake Video: Debunking the Fake Video of Amit Shah On Reservations – The Indian Express
AI Deepfake Video Of Actress Rashmika Mandanna Going Viral – Business Today
Anderson Cooper, 4K Original/(Deep) Fake Example - LipSynthesis
Fake Obama created using AI video tool - BBC News
This is not Morgan Freeman - A Deepfake Singularity
President Joe Biden's Magical Pistachio Story (Deepfake AI)
Deepfake example. Original/Deepfake close shot Bill Gates.
2024 Deepfake Example in 4k - ORIGINAL/DEEPFAKE - Bill Gates
Trump 4k Deepfake example – LipSynthesis

Sample Real Deepfake Clips used for testing:

Aamir Khan-Reena की लव स्टोरी कैसे शुरू हुई, घरवालों ने क्या हंगामा किया, छुपकर शादी क्यों करनी पड़ी.mp4
Anderson Cooper’s tribute to his friend Anthony Bourdain.mp4
Highlights from Obama's farewell address.mp4
HM Amit Shah’s Fiery Remark in Parliament_ ‘Hindu Terrorist Nahi Ho Sakta’ _ Amit Shah _ Rajya Sabha.mp4
Morgan Freeman Re-Enacts The Shawshank Redemption _ The Graham Norton Show.mp4
President Joe Biden Takes the Oath of Office _ Biden-Harris Inauguration 2021.mp4
President Trump's Inaugural Address.mp4
Ranveer Singh On Playing Khilji In Padmaavat _ India Today Exclusive Interview.mp4
Rashmika Mandanna Interview with Anupama Chopra _ Mission Majnu _ Goodbye _ Film Companion.mp4
The next outbreak_ We’re not ready _ Bill Gates _ TED.mp4

Tool 1: Deepware A.I.

All Fake Video Results:

The video titled “Viral Deepfake Videos Thrive Of Aamir Khan & Ranveer Singh Endorsing Political Parties – Business Today”, was scanned on 2025-09-20 at 06:19:29 UTC. The scan result indicates SUSPICIOUS. Model results: Avatarify 39% (no deepfake), Deepware 25% (no deepfake), Seferbekov 75% (suspicious), Ensemble 55% (suspicious). Duration: 181s, 1280x720, 29.97fps, h264; audio: 181s, stereo, 48kHz, AAC.
The video titled “Amit Shah Fake Video: Debunking the Fake Video of Amit Shah On Reservations – The Indian Express”, scanned 2025-09-20 at 06:25:41 UTC. SUSPICIOUS. Model results: Avatarify 0%, Deepware 20%, Seferbekov 97% (deepfake), Ensemble 67% (suspicious). Duration: 162s, 1280x720, 30fps, h264; audio: 162s, stereo, 48kHz, AAC.
The video titled “AI Deepfake Video Of Actress Rashmika Mandanna Going Viral – Business Today”, scanned 2025-09-20 at 06:28:12 UTC. NO DEEPFAKE DETECTED. Model results: Avatarify 24%, Deepware 16%, Seferbekov 46%, Ensemble 28%. Duration: 172s, 1280x720, 29.97fps, h264; audio: 172s, stereo, 48kHz, AAC.
The video titled “Anderson Cooper, 4K Original/(Deep) Fake Example - LipSynthesis”, scanned 2025-09-20 at 06:31:05 UTC. SUSPICIOUS. Model results: Avatarify 72% (suspicious), Deepware 0%, Seferbekov 3%, Ensemble 0%. Duration: 210s, 3840x2160, 30fps, h264; audio: 210s, stereo, 48kHz, AAC.
The video titled “Fake Obama created using AI video tool - BBC News”, scanned 2025-09-20 at 06:34:42 UTC. DEEPFAKE DETECTED. Model results: Analyst confirmed deepfake, Avatarify 19%, Deepware 0%, Seferbekov 49%, Ensemble 12%. Duration: 200s, 1920x1080, 29.97fps, h264; audio: 200s, stereo, 48kHz, AAC.
The video titled “This is not Morgan Freeman - A Deepfake Singularity”, scanned 2025-09-20 at 06:37:10 UTC. DEEPFAKE DETECTED. Model results: Analyst deepfake detected, Avatarify 18%, Deepware 0%, Seferbekov 0%, Ensemble 0%. Duration: 198s, 1920x1080, 29.97fps, h264; audio: 198s, stereo, 48kHz, AAC.
The video titled “President Joe Biden's Magical Pistachio Story (Deepfake AI)”, scanned 2025-09-20 at 06:39:55 UTC. SUSPICIOUS. Model results: Avatarify 29%, Deepware 34%, Seferbekov 71% (suspicious), Ensemble 58% (suspicious). Duration: 185s, 1280x720, 29.97fps, h264; audio: 185s, stereo, 48kHz, AAC.
The video titled “Deepfake example. Original/Deepfake close shot Bill Gates.”, scanned 2025-09-20 at 06:42:12 UTC. NO DEEPFAKE DETECTED. Model results: Avatarify 20%, Deepware 0%, Seferbekov 2%, Ensemble 0%. Duration: 180s, 1280x720, 29.97fps, h264; audio: 180s, stereo, 48kHz, AAC.
The video titled “2024 Deepfake Example in 4k - ORIGINAL/DEEPFAKE - Bill Gates”, scanned 2025-09-20 at 06:45:00 UTC. NO DEEPFAKE DETECTED. Model results: Avatarify 20%, Deepware 0%, Seferbekov 2%, Ensemble 0%. Duration: 182s, 3840x2160, 30fps, h264; audio: 182s, stereo, 48kHz, AAC.
The video titled “Trump 4k Deepfake example – LipSynthesis”, scanned 2025-09-20 at 06:48:33 UTC. NO DEEPFAKE DETECTED. Model results: Avatarify 40%, Deepware 2%, Seferbekov 1%, Ensemble 1%. Duration: 190s, 3840x2160, 30fps, h264; audio: 190s, stereo, 48kHz, AAC.

All Real Video Results:

All real videos scanned in September 2025. Deepware A.I. is in Beta; results are advisory only.
All real videos (see list above) were marked as NO DEEPFAKE DETECTED by all models (Avatarify, Deepware, Seferbekov, Ensemble), with low probabilities (0-35%). Video/audio specs varied; see supplementary for details.

Tool 2: Deepfake-O-Meter UB Media Forensics Lab

This project is supported by the University at Buffalo and the National Science Foundation (SaTC-2153112). The tool aggregates many advanced models for deepfake detection. See below for model list and results.

Detection Models Used

AVAD (2023): Audio-visual anomaly detection. Code
AVSRDD (2025): Dual-branch audio/visual deepfake detection. ACM DOI
AltFreezing (2023): Alternately freezes spatial/temporal weights. Code
CFM (2025): Fine-grained triplet relation learning. Code
DSP-FWA (2019): Affine face warping artifacts. Code
FTCN (2021): Temporal convolutional network. Code
LIPINC (2024): Lip-sync forgery detection.
LSDA (2024): Synthesized video/image detection.
SBI (2022): Self-blended images. Code
TALL (2023): Temporal-aware deepfake detector. Code
WAV2LIP-STA (2022): Wav2Lip-generated deepfake detection. Code
XCLIP (2022): Cross-modal pretraining. Code

All Fake Video Results:

Amit Shah Fake Video: DSP-FWA 96.1%, FTCN 0.4%, WAV2LIP-STA 48.3%, SBI 22.4%, XCLIP 71.4%, AltFreezing 16.7%, TALL 98.8%, LIPINC No Lip Movement, LSDA 74.2%, AVSRDD 99.7%, CFM 38.2%. Second run: similar scores. High likelihood of being fake by most advanced models.
Anderson Cooper.mp4: AVSRDD 99.9%, AltFreezing 85.1%, CFM 22.1%, DSP-FWA 25.2%, FTCN 0.0%, LIPINC 98.3%, LSDA 46.9%, SBI 20.5%, TALL 81.8%, WAV2LIP-STA 8.3%, XCLIP 89.4%. Most advanced models indicate fake.
Bill Gates.mp4: AVSRDD 0.4%, AltFreezing 86.4%, CFM 32.9%, DSP-FWA 38.2%, FTCN 6.4%, LIPINC 99.8%, LSDA 22.3%, SBI 14.2%, TALL 97.5%, WAV2LIP-STA 18.2%, XCLIP 96.2%. Mixed, but several models show high likelihood.
Joe.mp4: AVSRDD 100.0%, AltFreezing 18.3%, CFM 38.3%, DSP-FWA 99.8%, FTCN 0.1%, LIPINC 99.9%, LSDA 30.6%, SBI 14.5%, TALL 100.0%, WAV2LIP-STA 85.3%, XCLIP 100.0%. Strong consensus for fake.
MorganFreeman.mp4: AVSRDD 100.0%, AltFreezing 18.3%, CFM 38.3%, DSP-FWA 99.8%, FTCN 0.1%, LIPINC 99.9%, LSDA 30.6%, SBI 14.5%, TALL 100.0%, WAV2LIP-STA 85.3%, XCLIP 100.0%. Strong consensus for fake.
Obama.mp4: AVSRDD 100.0%, AltFreezing 63.8%, CFM 38.9%, DSP-FWA 84.8%, FTCN 1.4%, LIPINC 100.0%, LSDA 33.2%, SBI 10.2%, TALL 9.0%, WAV2LIP-STA 37.8%, XCLIP 99.1%. Advanced models indicate fake.
Ranveer Singh.mp4: AVSRDD 99.9%, AltFreezing 11.3%, CFM 51.4%, DSP-FWA 97.7%, FTCN 0.9%, LIPINC 100.0%, LSDA 35.6%, SBI 9.0%, TALL 99.7%, WAV2LIP-STA 85.8%, XCLIP 88.5%. Strong consensus for fake.
Rashmika.mp4: AVSRDD 100.0%, AltFreezing 56.8%, CFM 35.7%, DSP-FWA 34.4%, FTCN 7.5%, LIPINC No Lip Movement, LSDA 55.7%, SBI 44.4%, TALL 73.2%, WAV2LIP-STA 36.4%, XCLIP 86.9%. Advanced models indicate fake.
Trump.mp4: AVSRDD 99.6%, DSP-FWA 71.2%, TALL 44.1%, XCLIP 38.1%, LIPINC 35.6%, AVAD 32.9%, CFM 28.2%, WAV2LIP-STA 25.0%, AltFreezing 0.2%, LSDA 14.8%, SBI 16.1%. Mixed, but several models show high likelihood.

All Real Video Results:

Aamir Khan-Reena...: AVSRDD 100%, LIPINC 100%, TALL 97.5%, CFM 55.6%, AVAD 42.9%, WAV2LIP-STA 48.1%. AltFreezing 5.6%, FTCN 1.8%, SBI 6.7%, XCLIP 16.0%, LSDA 26.3%, DSP-FWA 19.6%. Advanced models indicate fake, older models do not.
Anderson Cooper’s tribute...: AVSRDD 100%, LIPINC 100%, TALL 56.8%, AVAD 47.6%, CFM 48.5%, XCLIP 49.5%. DSP-FWA 33.5%, LSDA 35.3%, WAV2LIP-STA 26.1%. AltFreezing 2.3%, FTCN 0.0%, SBI 0.3%. Advanced models indicate fake, older models do not.
Highlights from Obama's farewell address.mp4: AVSRDD 100%, LIPINC 98.0%, TALL 94.8%. LSDA 48.5%, CFM 41.1%, WAV2LIP-STA 37.4%, XCLIP 33.0%, AVAD 31.1%, SBI 26.7%. AltFreezing 3.0%, DSP-FWA 0.1%, FTCN 0.1%. Advanced models indicate fake, older models do not.
HM Amit Shah’s Fiery Remark...: AVSRDD 100%, LIPINC 100%, XCLIP 98.6%, TALL 93.2%, WAV2LIP-STA 93.4%, AltFreezing 66.4%. CFM 48.5%, LSDA 41.9%, SBI 32.9%, AVAD 30.6%. DSP-FWA 2.3%, FTCN 1.9%. Advanced models indicate fake, older models do not.
Morgan Freeman Re-Enacts...: AVSRDD 99.9%, TALL 96.4%, XCLIP 86.5%, LIPINC 83.5%, AltFreezing 88.1%. WAV2LIP-STA 64.2%, CFM 51.2%. DSP-FWA 40.0%, FTCN 34.8%, AVAD 34.6%. LSDA 23.9%, SBI 11.7%. Advanced models indicate fake, older models do not.
President Joe Biden Takes the Oath...: AVSRDD 100%, LIPINC 100%, XCLIP 96.2%. CFM 48.2%, AVAD 33.8%. LSDA 25.4%, WAV2LIP-STA 14.2%, SBI 11.0%, FTCN 8.2%. AltFreezing 3.5%, DSP-FWA 0.0%, TALL 0.6%. Advanced models indicate fake, older models do not.
President Trump's Inaugural Address.mp4: AVSRDD 100%, LIPINC 100%, XCLIP 96.2%. CFM 46.7%, AVAD 33.8%. LSDA 25.4%, WAV2LIP-STA 14.2%, SBI 11.0%, FTCN 8.2%. AltFreezing 3.5%, DSP-FWA 0.0%, TALL 0.6%. Advanced models indicate fake, older models do not.
Rashmika Mandanna Interview...: AVSRDD 99.9%, LIPINC 99.8%, TALL 99.4%, XCLIP 79.0%, WAV2LIP-STA 72.3%. CFM 44.8%, AVAD 36.7%, LSDA 28.3%, DSP-FWA 26.8%. FTCN 0.7%, SBI 4.9%, AltFreezing 6.3%. Advanced models indicate fake, older models do not.
The next outbreak_ We’re not ready _ Bill Gates _ TED.mp4: AVSRDD 100%, TALL 98.8%, XCLIP 98.9%, LIPINC 95.3%. WAV2LIP-STA 50.7%, CFM 42.1%, AVAD 40.0%. LSDA 32.8%, DSP-FWA 21.4%, SBI 15.6%, AltFreezing 9.4%, FTCN 0.4%. Advanced models indicate fake, older models do not.

Note: These percentages reflect statistical correlations with real and fake samples in training datasets and should not be interpreted as definitive evidence of authenticity or fabrication. See supplementary for full per-model breakdowns and references.

📈 Performance by Content Type

Face-swap deepfakes (Obama, Morgan Freeman) produced higher detection signals than lip-synthesis manipulations
Political content showed mixed results with occasional false positives on authentic speeches
Cultural factors affected detection - Indian entertainment deepfakes showed inconsistent patterns
Technical factors like resolution and compression affected detection consistency

⚠️ Critical Implications

Bottom Line: Current public detectors provide useful signals but are not reliable enough to serve as sole arbiters of authenticity for viral content. The high specificity (few false alarms) comes at the cost of poor sensitivity (missing many real deepfakes).

🔬 Technical Factors Affecting Detection

Content Type Impact:
- Face-swap/identity-replacement deepfakes performed better than lip-synthesis
- Political content showed mixed results with occasional false positives on authentic speeches
- Cultural context affected performance (Indian entertainment deepfakes showed inconsistent patterns)
Technical Specifications:
- Higher resolution content (3840×2160) sometimes reduced detection consistency
- Lower resolution clips (480×360) elicited more consistent flags among advanced models
- Video compression and multi-stage postprocessing masked detector cues
- Platform-specific compression from viral sharing affected artifacts

⚖️ Operational Implications

High-Stakes Contexts: Should NOT be used as sole arbiters for legal evidence, election monitoring, or content takedown decisions
Recommended Use: As triage tools flagging material for human expert review
Transparency Requirements: Publishers must document thresholds, model versions, analyst interventions, and scan timestamps
Threshold Sensitivity: Performance claims meaningless without specifying binary mapping strategy

🎯 Conclusion

Our analysis demonstrates that currently accessible detection tools offer useful signals but remain insufficiently reliable for fully automated judgments on viral real-world videos. While these tools showed excellent specificity (correctly identifying authentic content), their sensitivity varied dramatically depending on operational thresholds and content characteristics.

The substantial disagreement both within and across platforms points to deeper methodological issues. Current detectors respond to different artifact signatures rather than converging on robust indicators of synthetic origin, leading to situations where identical content produces near-certain and near-zero likelihood scores depending on the model consulted.

Key Recommendations:

Automated detectors should be used as triage tools paired with human expert review, not as sole arbiters
Transparency about thresholds, model versions, and analyst interventions is essential
Research should focus on principled ensemble weighting, model calibration, and domain-adaptive training
Evaluation protocols must reflect the messy, compressed, and culturally varied media found in real-world circulation

Until significant improvements are realized, automated detection should be used cautiously and as part of a broader, human-supervised verification workflow to protect against misinformation while avoiding false accusations.

Study Limitations & Future Directions

Current Study Limitations

Sample Size: Limited to 20 high-profile viral clips, which constrains statistical power and may not represent the full diversity of real-world manipulations
Ground Truth Verification: Established via public debunking reports and media documentation rather than direct access to generation artifacts
Temporal Snapshot: Results reflect detection model capabilities at specific time points (April 2024 - September 2025) since models evolve rapidly
Cultural/Language Scope: Focused primarily on English-language and Western/Indian content
Statistical Approach: Emphasized descriptive metrics rather than inferential testing due to modest sample size

Future Research Directions

Enhanced Evaluation

Expand testing to more diverse content (different languages, cultures, generation methods)
Create standardized benchmarks for real-world deepfake evaluation
Develop cross-dataset evaluation protocols that capture distributional diversity

Technical Improvements

Develop better consensus algorithms combining multiple detectors
Focus on principled ensemble weighting and model calibration
Implement domain-adaptive training for robustness across content types
Improve model explainability to help experts interpret detection decisions

Transparency & Standards

Establish requirements for publishing raw per-video numeric outputs
Mandate versioned model identifiers and scan timestamps
Document all analyst interventions and threshold choices
Create industry standards for detection platform transparency

Real-World Application

Integrate human expert review workflows with automated detection
Develop policies for high-stakes contexts (legal evidence, election monitoring)
Address compression artifacts and multi-stage sharing effects
Study demographic and cultural biases in detection systems

Frequently Asked Questions

With AI-generated deepfakes spreading rapidly on social media, people need to know if they can trust the online detection tools that claim to identify fake videos. Our research tests whether these popular tools actually work on real viral content, revealing serious limitations that could affect how we combat misinformation.

Previous studies only tested detection models on controlled, laboratory-created datasets. We're the first to systematically evaluate public detection tools on actual viral videos that went around social media - including deepfakes of Obama, Biden, Trump, and Bollywood celebrities. This real-world testing reveals problems that lab studies missed.

Our sample size was limited to 20 high-profile viral videos, which may not represent all types of deepfakes. Ground truth was established through public debunking reports rather than direct access to generation tools. Results reflect a snapshot in time since detection models evolve rapidly. We also focused on English-language and primarily Western/Indian content.

We provide complete video lists, platform URLs (Deepware.ai and DeepFake-O-Meter), and our evaluation methodology in the paper. Since these are web-based tools, anyone can test the same videos we used. However, results may vary over time as the platforms update their models.

We plan to expand testing to more diverse content (different languages, cultures, generation methods), develop better consensus algorithms that combine multiple detectors, and work with platform developers to improve transparency about how their tools work. We also want to create standardized benchmarks for real-world deepfake evaluation.

Use them as helpful hints, not definitive answers. Our research shows they miss many real deepfakes (low sensitivity) but rarely call real videos fake (high specificity). For important decisions - like news verification or legal evidence - always combine automated tools with human expert review and multiple sources of verification.

🔗 Related Work

� Technical Resources

📈

Extended Results

Additional experiments and detailed analysis not included in the main paper.

View Results →

🧪

Experiment Logs

Detailed logs from all experiments including failed attempts and iterations.

Download Logs →

📊

Raw Data

Complete dataset and raw experimental data for further analysis.

Access Data →

🎥

Presentation Slides

Slides from conference presentations and lab meetings.

View Slides →

📝

Technical Appendix

Detailed mathematical derivations and additional technical details.

Related Work

Comprehensive literature review and related research links.

Explore →

Replicability Guide

Quick Start

To reproduce our benchmarking study, researchers need access to the same detection platforms and video samples we used. Since our study evaluates publicly available tools on viral content, the main requirements are platform access and careful documentation of scan parameters.

Step-by-Step Instructions

1

Platform Access

Obtain access to detection platforms:

Deepware AI Scanner: https://deepware.ai (free beta access)
DeepFake-O-Meter: https://zinc.cse.buffalo.edu/ubmdfl/deep-o-meter/landing_page (research access)

2

Video Sample Collection & Dataset

Our study used 20 carefully selected viral videos (10 deepfakes + 10 authentic) featuring prominent public figures. All samples are available for research replication:

Complete Dataset Download

Download all 20 video samples (deepfakes + authentic) in a single compressed archive

Download All Deepfakes (10 videos) Download All Authentic (10 videos) Download Complete Dataset (20 videos) Download Test Results

Archive sizes: Deepfakes (~180MB) | Authentic (~195MB) | Complete Dataset (~375MB)

Deepfake Samples (10)

Viral Deepfake Videos Thrive Of Aamir Khan Endorsing Political Parties – Business today

Download Sample

Viral Deepfake Videos Thrive Of Ranveer Singh Endorsing Political Parties – Business today

Download Sample

Amit Shah Fake Video: Debunking the Fake Video of Amit Shah On Reservations – The Indian Express

Download Sample

Authentic Samples (10)

Aamir Khan-Reena की लव स्टोरी कैसे शुरू हुई, घरवालों ने क्या हंगामा किया, छुपकर शादी क्यों करनी पड़ी.mp4

Download Sample

Anderson Cooper’s tribute to his friend Anthony Bourdain.mp4

Download Sample

Highlights from Obama's farewell address.mp4

Download Sample

HM Amit Shah’s Fiery Remark in Parliament_ ‘Hindu Terrorist Nahi Ho Sakta’ _ Amit Shah _ Rajya Sabha.mp4

Download Sample

Morgan Freeman Re-Enacts The Shawshank Redemption _ The Graham Norton Show.mp4

Download Sample

President Joe Biden Takes the Oath of Office _ Biden-Harris Inauguration 2021.mp4

Download Sample

President Trump's Inaugural Address.mp4

Download Sample

Ranveer Singh On Playing Khilji In Padmaavat _ India Today Exclusive Interview.mp4

Download Sample

Rashmika Mandanna Interview with Anupama Chopra _ Mission Majnu _ Goodbye _ Film Companion.mp4

Download Sample

The next outbreak_ We’re not ready _ Bill Gates _ TED.mp4

Download Sample

Dataset Technical Specifications

File Format: MP4 containers using H.264 encoding
File Sizes: 5.7 MB to 34.4 MB
Durations: 48 to 242 seconds
Resolutions: 480×360 to 3840×2160 pixels
Processing: No manipulation or normalization applied
Verification: Ground truth established via public debunking reports and media documentation

3

Data Collection Protocol

For each video, systematically record:

Ground-truth label (deepfake vs authentic)
Platform ensemble scores and categorical labels
Per-component/model scores when available
Scan timestamps and platform version info
Any analyst notes or special conditions

4

Analysis & Interpretation

Apply our evaluation framework:

Compute confusion matrices for different binary mappings
Calculate sensitivity, specificity, accuracy, precision, F1 scores
Analyze cross-platform agreement and per-model variance
Document threshold choices and operational implications

📦 Resources

💻 Code Repository 📊 Dataset � Requirements.txt 🐳 Docker Image

Acknowledgements

We thank Mrs. Sirisha Vadigineni (Narayana E-Techno and Olympiad School, Whitefield, Bengaluru) and Mrs. Ramya Shujith (Glentree Academy, Whitefield, Bengaluru) for their guidance and support during this research.
This work utilized the Deepware Scanner https://deepware.ai and the DeepFake-o-meter platform https://zinc.cse.buffalo.edu/ubmdfl/deep-o-meter/landing_page, which provided accessible and effective tools for detecting synthetic media. I acknowledge the contributions of the teams behind these tools, including the UB Media Forensics Lab, supported by the University at Buffalo and the National Science Foundation under Grant SaTC-2153112.

Contact & Collaborate

Interested in our deepfake detection research? Have questions about our methodology or want to collaborate on misinformation detection? We'd love to hear from you!

Primary Contact

For research inquiries and collaboration

Tuhin Sarkar
Student Researcher
Glentree Academy, Bengaluru

Institution

High school research program

Glentree Academy
Whitefield, Bengaluru
Karnataka, India

Research Area

AI Ethics & Media Forensics

Deepfake Detection
Misinformation Analysis

Open to Collaboration On:

Deepfake Detection Media Forensics AI Ethics Cross-platform Analysis Social Media Research Misinformation Studies

Benchmarking and Cross-Platform Evaluation of Public Deepfake Detection Models on Viral Real-World Media

TL;DR (Layman's Summary)

🔎 What we did:

Why it matters:

📂 Key finding:

Impact & Applications

Election Security

Media & Journalism

Legal Evidence

Digital Safety

Current Challenges

⚡ Future Directions

🎥 Research Demo

📊 Research Infographic

Visual Summary

📄 Abstract

Methodology

Video Sample Selection

👥 Notable Test Cases Include:

Detection Platforms

Platform 1: Deepware AI Scanner (Beta)

Platform 2: UB Media Forensics Lab's DeepFake-O-Meter

⚙️ Evaluation Approach

📊 Binary Classification Mappings

📏 Performance Metrics

Results

Key Findings

Benchmarking and Cross-Platform Evaluation of Public Deepfake Detection Models on Viral Real-World Media

Raw Data Results

Sample Fake Deepfake Clips used for testing:

Sample Real Deepfake Clips used for testing:

Tool 1: Deepware A.I.

Tool 2: Deepfake-O-Meter UB Media Forensics Lab

📈 Performance by Content Type

⚠️ Critical Implications

🔬 Technical Factors Affecting Detection

⚖️ Operational Implications

🎯 Conclusion

Study Limitations & Future Directions

Current Study Limitations

Future Research Directions

Enhanced Evaluation

Technical Improvements

Transparency & Standards

Real-World Application

Frequently Asked Questions

🔗 Related Work

� Technical Resources

Extended Results

Experiment Logs

Raw Data

Presentation Slides

Technical Appendix

Related Work

Replicability Guide

Quick Start

Step-by-Step Instructions

Platform Access

Video Sample Collection & Dataset

Complete Dataset Download

Deepfake Samples (10)

Viral Deepfake Videos Thrive Of Aamir Khan Endorsing Political Parties – Business today

Viral Deepfake Videos Thrive Of Ranveer Singh Endorsing Political Parties – Business today

Amit Shah Fake Video: Debunking the Fake Video of Amit Shah On Reservations – The Indian Express

AI Deepfake Video Of Actress Rashmika Mandanna Going Viral – Business Today

Anderson Cooper, 4K Original/(Deep) Fake Example - LipSynthesis

Fake Obama created using AI video tool - BBC News

This is not Morgan Freeman - A Deepfake Singularity

President Joe Biden's Magical Pistachio Story (Deepfake AI)

2024 Deepfake Example in 4k - ORIGINAL/DEEPFAKE - Bill Gates

Trump 4k Deepfake example – LipSynthesis

Authentic Samples (10)

Aamir Khan-Reena की लव स्टोरी कैसे शुरू हुई, घरवालों ने क्या हंगामा किया, छुपकर शादी क्यों करनी पड़ी.mp4

Anderson Cooper’s tribute to his friend Anthony Bourdain.mp4

Highlights from Obama's farewell address.mp4

HM Amit Shah’s Fiery Remark in Parliament_ ‘Hindu Terrorist Nahi Ho Sakta’ _ Amit Shah _ Rajya Sabha.mp4

Morgan Freeman Re-Enacts The Shawshank Redemption _ The Graham Norton Show.mp4

President Joe Biden Takes the Oath of Office _ Biden-Harris Inauguration 2021.mp4

President Trump's Inaugural Address.mp4

Ranveer Singh On Playing Khilji In Padmaavat _ India Today Exclusive Interview.mp4