mexca.text.transcription
Transcribe speech from audio to text.
Module Contents
Classes
Transcribe speech from audio to text. |
Functions
|
Command line interface for audio transcription. |
- class mexca.text.transcription.AudioTranscriber(whisper_model: str | None = 'small', device: str | torch.device | None = 'cpu', sentence_rule: str | None = None)[source]
Transcribe speech from audio to text.
- Parameters:
whisper_model (str, optional, default='small') – The name of the whisper model that is used for transcription. Available models are [‘tiny.en’, ‘tiny’, ‘base.en’, ‘base’, ‘small.en’, ‘small’, ‘medium.en’, ‘medium’, ‘large’].
device (str or torch.device, optional, default='cpu') – The name of the device onto which the whisper model should be loaded and run. If CUDA support is available, this can be ‘cuda’, otherwise use ‘cpu’ (the default).
sentence_rule (str, optional) – A regular expression used to split segment transcripts into sentences. If None (default), it splits the text at all ‘.’, ‘?’, ‘!’, and ‘:’ characters that are followed by whitespace characters. It omits single or multiple words abbreviated with dots (e.g., ‘Nr. ‘ and ‘e.g. ‘).
- apply(filepath: str, audio_annotation: mexca.data.SpeakerAnnotation, language: str | None = None, options: whisper.DecodingOptions | None = None, show_progress: bool = True) mexca.data.AudioTranscription [source]
Transcribe speech in an audio file to text.
Transcribe each annotated speech segment in the audio file and split the transcription into sentences according to sentence_rule.
- Parameters:
filepath (str) – Path to the audio file.
audio_annotation (SpeakerAnnotation) – The audio annotation object returned the SpeakerIdentifier component.
language (str, optional, default=None) – The language that is transcribed. Ignored if options.language is not None.
options (whisper.DecodingOptions, optional) – Options for transcribing the audio file. If None, transcription is done without timestamps, and with a number format that depends on whether CUDA is available: FP16 (half-precision floating points) if available, FP32 (single-precision floating points) otherwise.
show_progress (bool, optional, default=True) – Whether a progress bar is displayed or not.
- Returns:
A data class object containing transcribed speech segments split into sentences.
- Return type: