Introducing SIMS v2t — Portable Video & Audio to Text for Your Desktop
March 27, 2026
by SIMS Tech6 min readIntroducing SIMS v2t — Portable Video & Audio to Text for Your Desktop
Today we're releasing SIMS v2t v1.0 — a free, open-source desktop application that converts video files, audio recordings, and YouTube URLs into plain text transcripts.
A New Addition to the SIMS Tech Portfolio
SIMS v2t is a portable Tauri desktop app built with Rust and React. Drop a video file, paste a YouTube URL, or add a folder — and get a .txt transcript. Supports both cloud API (OpenAI-compatible) and fully offline transcription via whisper.cpp. No account required for offline mode. Windows, macOS, Linux. MIT License.
Why We Built This
Every time we needed to transcribe something — a recorded meeting, a YouTube tutorial, an interview — the options were the same: upload to a cloud service and hope your data is handled responsibly, run a complex command-line tool, or pay a per-minute subscription fee.
None of those options felt right for a tool you'd actually want to use daily.
We wanted something you could put in a folder and run. Something that works offline when needed. Something that handles a YouTube playlist as easily as a single local file. Something with a proper GUI that doesn't require you to read documentation to start it.
That's SIMS v2t.
What It Does
📁 Drag & Drop Queue
Drop files directly. Paste YouTube URLs. Pick a folder for batch processing — v2t scans for all video and audio formats recursively. Jobs run sequentially with live status and log output.
🌐 YouTube & Playlist Support
Paste a YouTube link — single video or full playlist. yt-dlp handles extraction. A playlist produces one transcript per video. Optionally save the original video file alongside the transcript.
☁️ Cloud API Mode
Connect to any OpenAI-compatible endpoint — OpenAI Whisper, Groq, local LM Studio, or a self-hosted service. Configure base URL and model. API key stored in OS keychain, never in a plain text file.
🖥️ Fully Offline Mode
Switch to whisper.cpp local mode. Choose a model (tiny to large-v3-turbo), download it once with one click. From that point on — transcription with no internet connection, no API key, no cost per minute.
⬇️ One-Click Tool Setup
A single button downloads and installs ffmpeg and yt-dlp on Windows and macOS. whisper-cli auto-detected via Homebrew on macOS or downloaded on Windows from the official release. No manual setup for most users.
🔄 Retry & Resume
API errors retry automatically with backoff. Large files split into chunks — each checkpointed. If a job is interrupted halfway through a 3-hour recording, it resumes from the last completed chunk.
Two Transcription Modes
The core design decision in v2t is offering two independent paths for getting text from audio:
Cloud API Mode
You configure a Whisper-compatible API endpoint (OpenAI, Groq, a self-hosted instance, local LM Studio). v2t sends audio as a multipart HTTP request and gets text back. The API key lives in the OS keychain — Windows Credential Manager or macOS Keychain — and is never written to any configuration file.
Files larger than 22 MB are split into 8-minute chunks automatically by ffmpeg, transcribed in sequence, and reassembled into a single output file. The split is transparent.
Local Offline Mode
Select a whisper.cpp model in Settings (tiny, base, small, medium, or large-v3-turbo). v2t downloads the model file from the official Hugging Face repository and verifies the SHA-1 checksum before marking it ready. After that, all transcription happens locally — ffmpeg normalizes audio, whisper-cli processes it, v2t reads the output. No network traffic, no API calls, no ongoing cost.
Real-time progress comes from parsing whisper-cli's stderr output — the percentage completes as the model processes audio. The same chunking and resume logic applies as in cloud mode.
Technical Choices
SIMS v2t is built with Tauri 2 — Rust backend running as a native process, React + TypeScript frontend rendered in a system webview. This means the application is genuinely native (not Electron), with a small binary footprint and access to system APIs.
The Rust backend handles:
- Process spawning for ffmpeg, yt-dlp, and whisper-cli with proper cancellation (CancellationToken + process tree kill)
- HTTP requests to the transcription API via reqwest with streaming progress
- File I/O, chunk management, and resume logic
- OS keyring integration for API key storage
The React frontend handles:
- Queue management and status display
- Settings UI with separate tabs for queue and configuration
- Real-time log streaming from Tauri events
- Download progress for tools and models
All external processes are called with explicit argument arrays — no shell interpolation of user input — which eliminates a class of injection vulnerabilities common in tools that build command strings.
Who Is This For
Honestly, anyone who regularly deals with audio or video content and needs the text:
- Researchers transcribing interview recordings without uploading them to external services
- Journalists turning recorded sources into searchable text
- Content creators generating transcripts from YouTube videos for repurposing
- Teams processing meeting recordings in batch
- Developers who want a reliable transcription pipeline without building one
The offline mode makes it particularly useful for anyone handling sensitive recordings that shouldn't leave the local machine.
Getting Started
Download from GitHub Releases. The first-run Setup Guide walks through:
- Setting an output folder
- Downloading ffmpeg + yt-dlp (one button on Windows/macOS)
- Choosing transcription mode — cloud API or local whisper
- Entering an API key or downloading a whisper model
After that: drag a file or paste a URL, click Start, get your transcript.
Open Source
SIMS v2t is MIT-licensed. The full source is on GitHub. Contributions welcome.
Links
- 🐙 GitHub (source + releases): github.com/vglu/v2t
- 🌐 Product page: sims-service.com/products/sims-v2t
- 📧 Contact: vhlu@sims-service.com
🎙️ Stop copying text from video manually.
Drag a file. Paste a URL. Get your transcript. Offline or cloud. Free & Open Source (MIT).
SIMS v2t is free, open-source software by SIMS Tech (MIT License). Source code: github.com/vglu/v2t
Related Posts
Introducing ADO Pipeline Comparator — Compare Azure DevOps Pipelines Across Organizations
March 10, 2026
We're releasing ADO Pipeline Comparator v1.0 — a free, open-source web tool for comparing Build and Release pipelines across multiple Azure DevOps organizations side-by-side. Find differences in stages, steps, variables, and triggers in seconds. Export to Excel or PDF. Docker or Windows EXE — MIT License.
Introducing D365FO Deploy Portal — Package Deployment to Power Platform Made Easy
February 9, 2025
We're excited to introduce D365FO Deploy Portal — a web-based tool that replaces the manual D365FO package deployment pipeline with a single, unified UI. Upload, merge, convert, and deploy to multiple Power Platform environments simultaneously.