mexca.text.transcription

Transcribe speech from audio to text.

Module Contents

Classes

AudioTranscriber

Transcribe speech from audio to text.

Functions

cli()

Command line interface for audio transcription.

class mexca.text.transcription.AudioTranscriber(whisper_model: str | None = 'small', device: str | torch.device | None = 'cpu', sentence_rule: str | None = None)[source]

Transcribe speech from audio to text.

Parameters:
  • whisper_model (str, optional, default='small') – The name of the whisper model that is used for transcription. Available models are [‘tiny.en’, ‘tiny’, ‘base.en’, ‘base’, ‘small.en’, ‘small’, ‘medium.en’, ‘medium’, ‘large’].

  • device (str or torch.device, optional, default='cpu') – The name of the device onto which the whisper model should be loaded and run. If CUDA support is available, this can be ‘cuda’, otherwise use ‘cpu’ (the default).

  • sentence_rule (str, optional) – A regular expression used to split segment transcripts into sentences. If None (default), it splits the text at all ‘.’, ‘?’, ‘!’, and ‘:’ characters that are followed by whitespace characters. It omits single or multiple words abbreviated with dots (e.g., ‘Nr. ‘ and ‘e.g. ‘).

property transcriber: whisper.Whisper[source]

The loaded whisper model for audio transcription.

apply(filepath: str, audio_annotation: mexca.data.SpeakerAnnotation, language: str | None = None, options: whisper.DecodingOptions | None = None, show_progress: bool = True) mexca.data.AudioTranscription[source]

Transcribe speech in an audio file to text.

Transcribe each annotated speech segment in the audio file and split the transcription into sentences according to sentence_rule.

Parameters:
  • filepath (str) – Path to the audio file.

  • audio_annotation (SpeakerAnnotation) – The audio annotation object returned the SpeakerIdentifier component.

  • language (str, optional, default=None) – The language that is transcribed. Ignored if options.language is not None.

  • options (whisper.DecodingOptions, optional) – Options for transcribing the audio file. If None, transcription is done without timestamps, and with a number format that depends on whether CUDA is available: FP16 (half-precision floating points) if available, FP32 (single-precision floating points) otherwise.

  • show_progress (bool, optional, default=True) – Whether a progress bar is displayed or not.

Returns:

A data class object containing transcribed speech segments split into sentences.

Return type:

AudioTranscription

static get_default_options(language: str | None = None) whisper.DecodingOptions[source]

Set default options for transcription.

Sets language as well as without_timestamps=False and fp16=torch.cuda.is_available().

Return type:

whisper.DecodingOptions

mexca.text.transcription.cli()[source]

Command line interface for audio transcription. See transcribe -h for details.