LiberSonora

An AI-powered, robust, open-source audiobook toolkit that includes features like intelligent subtitle extraction, AI title generation, multilingual translation.

GitHub Official Website

Visit Website

What is LiberSonora

LiberSonora, meaning "Free Voice", is an AI-powered, open-source audiobook toolkit designed to operate entirely offline with GPU acceleration capabilities. The platform solves common audiobook management problems through intelligent subtitle extraction, smart title generation, and multilingual translation support. Designed with modularity in mind, LiberSonora allows users to leverage specific services independently or as a complete solution, offering flexibility for various audiobook processing needs. The toolkit supports multiple large language models, including local Ollama, DeepSeek, and OpenAI, giving users choice and control over their AI processing requirements. With its batch processing capabilities, LiberSonora can efficiently handle large quantities of audiobooks, eliminating the need for tedious manual processing. Its support for local audio file processing improves efficiency by removing file transfer steps, while also ensuring complete data privacy and security as all processing happens offline. The project is developed with the vision of creating a comprehensive audiobook ecosystem, with the current implementation focused on intelligent subtitle extraction, title generation, and multilingual support - core features that transform how users organize, access, and engage with audiobook content.

How to Use

LiberSonora provides a user-friendly web interface that guides you through the entire process from uploading audio files to downloading the finished subtitles and translations.

Step 1: Installation

Clone the repository and use Docker to set up the environment with 'docker-compose -f docker-compose.gpu.yml up -d'.

Step 2: Configuration

Configure your LLM preferences and API settings in the configuration file.

Step 3: Processing

Upload your audiobook files and use the web interface to initiate processing tasks.

Step 4: Export

Download the processed subtitles, translations, and metadata for use with your audiobook collection.

Core Features

Multi-model Selection and Integration

LiberSonora offers users the flexibility to choose from multiple large language model (LLM) implementations, including local Ollama, DeepSeek, and OpenAI. This feature enables users to select the most appropriate AI model based on their specific needs and resource constraints. For offline environments, the system supports fully local deployment using Ollama, ensuring privacy and operation in air-gapped settings. Users with existing DeepSeek or OpenAI subscriptions can leverage those services through simple configuration changes, maximizing their existing investments.

Intelligent Subtitle Extraction

The platform incorporates state-of-the-art automatic speech recognition (ASR) to extract accurate subtitles from audiobook files. The system employs advanced audio preprocessing techniques that handle varying audio quality, background noise, and speaker variations commonly found in audiobooks. The subtitle extraction process includes intelligent timing alignment that accurately matches text to audio timing, creating properly synchronized subtitle files. The system supports batch processing of multiple audio files, enabling efficient subtitle extraction for entire audiobook collections.

AI-Powered Title Generation

LiberSonora utilizes advanced natural language processing to analyze subtitle content and generate appropriate, descriptive titles for audiobook files. The AI title generation system understands the context, themes, and key elements of the audio content, creating titles that accurately reflect the material. This feature addresses the common problem of unnamed or poorly named audiobook files, improving organization and searchability of collections. The system allows users to specify naming conventions and formats, ensuring consistency across their entire audiobook library.

Multilingual Translation Support

The platform offers comprehensive multilingual capabilities for both subtitle extraction and translation. The subtitle system can detect and process multiple languages, while the translation features allow conversion between numerous language pairs. The translation engine maintains context awareness across paragraphs and chapters, preserving the narrative flow and specialized terminology throughout the audiobook. Users can select specific translation models optimized for literary content, ensuring higher quality results for audiobook material.

Complete Offline Operation

LiberSonora is designed to function entirely offline after initial setup. All audio processing, subtitle extraction, and translations happen locally on your machine using the selected AI models. No audio files or content are uploaded to external servers (unless you specifically configure an external API like OpenAI). This design ensures complete data privacy and makes LiberSonora suitable for sensitive content.

Integration Capabilities

API and Service Integration

RESTful API endpoints for integrating LiberSonora's capabilities into existing audiobook management workflows.

Docker Containerization

Complete Docker implementation for easy deployment and consistent environment across different systems.

Multi-LLM Support

Flexible integration with various large language models including local Ollama, DeepSeek, and OpenAI.

Subtitle Format Support

Export capabilities for various subtitle formats including SRT, VTT, and JSON for maximum compatibility.

Batch Processing Interface

Programmatic batch processing capabilities for handling large audiobook collections efficiently.

Use Cases

Audiobook Archive Management

Libraries and archivists use LiberSonora to process large collections of audiobooks, extracting subtitles and generating accurate metadata. This makes content searchable, accessible for the hearing impaired, and enriches catalog information. The batch processing capabilities enable efficient management of extensive collections.

Educational Content Creation

Educators use LiberSonora to create accessible versions of audio lectures and educational materials. By extracting subtitles and translating content into multiple languages, educational resources become available to diverse audiences, including international students and those with hearing disabilities.

Personal Audiobook Library Organization

Individual audiobook enthusiasts use LiberSonora to organize personal collections by generating accurate titles and extracting content details. This makes finding specific content easier and enables text-based searching through audio collections. The offline capabilities ensure privacy for personal collections.

Multilingual Content Distribution

Content creators and publishers use LiberSonora to prepare audiobooks for international distribution. By extracting and translating subtitles, they can quickly create multilingual versions of their content, expanding their potential audience without requiring new recordings.

FAQ

Q: What hardware requirements does LiberSonora have?

A: LiberSonora benefits from GPU acceleration, particularly for processing multiple files in batch mode. The system works with NVIDIA GPUs supporting CUDA. For optimal performance, we recommend at least 8GB of GPU memory and 16GB of system RAM. However, the application can also run on CPU-only systems with reduced performance. Docker is required for the containerized environment.

Q: How does LiberSonora ensure my data privacy?

A: LiberSonora is designed to operate completely offline after initial setup. All audio processing, subtitle extraction, and translations happen locally on your machine using the selected AI models. No audio files or content are uploaded to external servers (unless you specifically configure an external API like OpenAI). This design ensures complete data privacy and makes LiberSonora suitable for sensitive content.

Q: Which audio formats are supported?

A: LiberSonora supports most common audio formats including MP3, WAV, FLAC, AAC, and OGG. The system uses FFmpeg internally, so it inherits support for the wide range of formats that FFmpeg can process. For optimal results, uncompressed or lossless formats like WAV or FLAC provide the best quality for subtitle extraction, though compressed formats like MP3 work well in most cases.

Q: How accurate is the subtitle extraction?

A: The accuracy of subtitle extraction depends on several factors including audio quality, speaking clarity, background noise, and the chosen AI model. Under good conditions (clear speech, minimal background noise), accuracy rates of 90-95% are typical. The system performs best with professional audiobook recordings but can handle varying qualities. Users can also manually edit extracted subtitles through the interface if necessary.

Q: Can I use LiberSonora with my existing audiobook management software?

A: Yes, LiberSonora is designed to complement existing audiobook management systems. It processes your audio files and generates subtitles and metadata that can be used with other applications. Additionally, its API support allows for integration with custom workflows and other software solutions if you have development resources.

Q: How do I deploy LiberSonora on my system?

A: LiberSonora uses Docker for easy deployment. After cloning the repository from GitHub, you can start the application using the provided docker-compose file with the command 'docker-compose -f docker-compose.gpu.yml up -d'. This will set up all necessary services. Detailed installation instructions are available in the project's README file on GitHub.

Repository Data

Stars

335

Forks

Watchers

Latest Commit

unknown

Repository Age

unknown

License

MIT

View Repository

Language Distribution

Python

95.4%

Shell

4.6%

Based on repository file analysis

Top Contributors

wangerzi

46 commits

View All Contributors

Sorted by number of contributions