
LiberSonora
An AI-powered, robust, open-source audiobook toolkit that includes features like intelligent subtitle extraction, AI title generation, multilingual translation.
What is LiberSonora
How to Use
Step 1: Installation
Clone the repository and use Docker to set up the environment with 'docker-compose -f docker-compose.gpu.yml up -d'.
Step 2: Configuration
Configure your LLM preferences and API settings in the configuration file.
Step 3: Processing
Upload your audiobook files and use the web interface to initiate processing tasks.
Step 4: Export
Download the processed subtitles, translations, and metadata for use with your audiobook collection.
Core Features
Multi-model Selection and Integration
LiberSonora offers users the flexibility to choose from multiple large language model (LLM) implementations, including local Ollama, DeepSeek, and OpenAI. This feature enables users to select the most appropriate AI model based on their specific needs and resource constraints. For offline environments, the system supports fully local deployment using Ollama, ensuring privacy and operation in air-gapped settings. Users with existing DeepSeek or OpenAI subscriptions can leverage those services through simple configuration changes, maximizing their existing investments.
Intelligent Subtitle Extraction
The platform incorporates state-of-the-art automatic speech recognition (ASR) to extract accurate subtitles from audiobook files. The system employs advanced audio preprocessing techniques that handle varying audio quality, background noise, and speaker variations commonly found in audiobooks. The subtitle extraction process includes intelligent timing alignment that accurately matches text to audio timing, creating properly synchronized subtitle files. The system supports batch processing of multiple audio files, enabling efficient subtitle extraction for entire audiobook collections.
AI-Powered Title Generation
LiberSonora utilizes advanced natural language processing to analyze subtitle content and generate appropriate, descriptive titles for audiobook files. The AI title generation system understands the context, themes, and key elements of the audio content, creating titles that accurately reflect the material. This feature addresses the common problem of unnamed or poorly named audiobook files, improving organization and searchability of collections. The system allows users to specify naming conventions and formats, ensuring consistency across their entire audiobook library.
Multilingual Translation Support
The platform offers comprehensive multilingual capabilities for both subtitle extraction and translation. The subtitle system can detect and process multiple languages, while the translation features allow conversion between numerous language pairs. The translation engine maintains context awareness across paragraphs and chapters, preserving the narrative flow and specialized terminology throughout the audiobook. Users can select specific translation models optimized for literary content, ensuring higher quality results for audiobook material.
Complete Offline Operation
LiberSonora is designed to function entirely offline after initial setup. All audio processing, subtitle extraction, and translations happen locally on your machine using the selected AI models. No audio files or content are uploaded to external servers (unless you specifically configure an external API like OpenAI). This design ensures complete data privacy and makes LiberSonora suitable for sensitive content.
Integration Capabilities
API and Service Integration
RESTful API endpoints for integrating LiberSonora's capabilities into existing audiobook management workflows.
Docker Containerization
Complete Docker implementation for easy deployment and consistent environment across different systems.
Multi-LLM Support
Flexible integration with various large language models including local Ollama, DeepSeek, and OpenAI.
Subtitle Format Support
Export capabilities for various subtitle formats including SRT, VTT, and JSON for maximum compatibility.
Batch Processing Interface
Programmatic batch processing capabilities for handling large audiobook collections efficiently.
Use Cases
Audiobook Archive Management
Libraries and archivists use LiberSonora to process large collections of audiobooks, extracting subtitles and generating accurate metadata. This makes content searchable, accessible for the hearing impaired, and enriches catalog information. The batch processing capabilities enable efficient management of extensive collections.
Educational Content Creation
Educators use LiberSonora to create accessible versions of audio lectures and educational materials. By extracting subtitles and translating content into multiple languages, educational resources become available to diverse audiences, including international students and those with hearing disabilities.
Personal Audiobook Library Organization
Individual audiobook enthusiasts use LiberSonora to organize personal collections by generating accurate titles and extracting content details. This makes finding specific content easier and enables text-based searching through audio collections. The offline capabilities ensure privacy for personal collections.
Multilingual Content Distribution
Content creators and publishers use LiberSonora to prepare audiobooks for international distribution. By extracting and translating subtitles, they can quickly create multilingual versions of their content, expanding their potential audience without requiring new recordings.
FAQ
Q: What hardware requirements does LiberSonora have?
A: LiberSonora benefits from GPU acceleration, particularly for processing multiple files in batch mode. The system works with NVIDIA GPUs supporting CUDA. For optimal performance, we recommend at least 8GB of GPU memory and 16GB of system RAM. However, the application can also run on CPU-only systems with reduced performance. Docker is required for the containerized environment.
Q: How does LiberSonora ensure my data privacy?
A: LiberSonora is designed to operate completely offline after initial setup. All audio processing, subtitle extraction, and translations happen locally on your machine using the selected AI models. No audio files or content are uploaded to external servers (unless you specifically configure an external API like OpenAI). This design ensures complete data privacy and makes LiberSonora suitable for sensitive content.
Q: Which audio formats are supported?
A: LiberSonora supports most common audio formats including MP3, WAV, FLAC, AAC, and OGG. The system uses FFmpeg internally, so it inherits support for the wide range of formats that FFmpeg can process. For optimal results, uncompressed or lossless formats like WAV or FLAC provide the best quality for subtitle extraction, though compressed formats like MP3 work well in most cases.
Q: How accurate is the subtitle extraction?
A: The accuracy of subtitle extraction depends on several factors including audio quality, speaking clarity, background noise, and the chosen AI model. Under good conditions (clear speech, minimal background noise), accuracy rates of 90-95% are typical. The system performs best with professional audiobook recordings but can handle varying qualities. Users can also manually edit extracted subtitles through the interface if necessary.
Q: Can I use LiberSonora with my existing audiobook management software?
A: Yes, LiberSonora is designed to complement existing audiobook management systems. It processes your audio files and generates subtitles and metadata that can be used with other applications. Additionally, its API support allows for integration with custom workflows and other software solutions if you have development resources.
Q: How do I deploy LiberSonora on my system?
A: LiberSonora uses Docker for easy deployment. After cloning the repository from GitHub, you can start the application using the provided docker-compose file with the command 'docker-compose -f docker-compose.gpu.yml up -d'. This will set up all necessary services. Detailed installation instructions are available in the project's README file on GitHub.
Repository Data
Language Distribution
Based on repository file analysis
You May Also Like


