FileScope
Intelligent file analyzer that classifies, audits, and cleans local files using machine learning algorithms.
Tech Stack
Key Highlights
TF-IDF + Naive Bayes for text classification
ResNet50 feature extraction for image analysis
MD5 hash-based duplicate detection
Multi-threaded Tkinter GUI with progress tracking
Safe file operations with undo functionality
Exportable CSV reports and audit logs
Project Details
I built a desktop app that classifies, audits, and cleans local files—using TF-IDF + Naive Bayes for text and ResNet50 features for images—then lets you compress, archive, or delete safely via a Tkinter UI.
Multimodal classification:
**Text:** TF-IDF vectorization → Multinomial Naive Bayes (topic/doctype labels).
**Images:** ResNet50 (pretrained) feature extractor → lightweight classifier (LogReg/SVM).
**Batch scan & insights:** Recursively indexes folders, extracts metadata (size, type, mtime), computes hashes (MD5) to detect duplicates, surfaces large/old/rarely-opened candidates.
**Interactive GUI:** Tkinter table with filters, preview pane, progress bars, cancel-safe scanning, and one-click actions (compress to ZIP, move to archive, safe delete to OS trash).
**Quality & reporting:** Confusion matrix, precision/recall/F1, per-class support; exportable CSV of findings and actions log for auditability.
**Safety rails:** Dry-run mode, undo queue, permission checks, integrity verification after compress/move.
My contributions:
• Implemented the text & image pipelines, feature caching, and model persistence; wrote the duplicate finder (hash + size heuristics).
• Built the Tkinter UI (virtualized table, preview, progress), multi-threaded scanning worker, and action handlers with rollback.
• Authored evaluation scripts and reporting (metrics, CSV export), plus config profiles for "aggressive" vs "conservative" cleanup.