mexca.text.transcription ======================== .. py:module:: mexca.text.transcription .. autoapi-nested-parse:: Transcribe speech from audio to text. Classes ------- .. autoapisummary:: mexca.text.transcription.AudioTranscriber Functions --------- .. autoapisummary:: mexca.text.transcription.cli Module Contents --------------- .. py:class:: AudioTranscriber(whisper_model: Optional[str] = 'small', device: Optional[Union[str, torch.device]] = 'cpu', sentence_rule: Optional[str] = None) Transcribe speech from audio to text. :param whisper_model: The name of the whisper model that is used for transcription. Available models are `['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large']`. :type whisper_model: str, optional, default='small' :param device: The name of the device onto which the whisper model should be loaded and run. If CUDA support is available, this can be `'cuda'`, otherwise use `'cpu'` (the default). :type device: str or torch.device, optional, default='cpu' :param sentence_rule: A regular expression used to split segment transcripts into sentences. If `None` (default), it splits the text at all '.', '?', '!', and ':' characters that are followed by whitespace characters. It omits single or multiple words abbreviated with dots (e.g., 'Nr. ' and 'e.g. '). :type sentence_rule: str, optional .. py:property:: transcriber :type: whisper.Whisper The loaded whisper model for audio transcription. .. py:method:: apply(filepath: str, audio_annotation: mexca.data.SpeakerAnnotation, language: Optional[str] = None, options: Optional[whisper.DecodingOptions] = None, show_progress: bool = True) -> mexca.data.AudioTranscription Transcribe speech in an audio file to text. Transcribe each annotated speech segment in the audio file and split the transcription into sentences according to `sentence_rule`. :param filepath: Path to the audio file. :type filepath: str :param audio_annotation: The audio annotation object returned the `SpeakerIdentifier` component. :type audio_annotation: SpeakerAnnotation :param language: The language that is transcribed. Ignored if `options.language` is not `None`. :type language: str, optional, default=None :param options: Options for transcribing the audio file. If `None`, transcription is done without timestamps, and with a number format that depends on whether CUDA is available: FP16 (half-precision floating points) if available, FP32 (single-precision floating points) otherwise. :type options: whisper.DecodingOptions, optional :param show_progress: Whether a progress bar is displayed or not. :type show_progress: bool, optional, default=True :returns: A data class object containing transcribed speech segments split into sentences. :rtype: AudioTranscription .. py:method:: get_default_options(language: Optional[str] = None) -> whisper.DecodingOptions :staticmethod: Set default options for transcription. Sets language as well as `without_timestamps=False` and `fp16=torch.cuda.is_available()`. :rtype: whisper.DecodingOptions .. py:function:: cli() Command line interface for audio transcription. See `transcribe -h` for details.