mexca.text.transcription
========================

.. py:module:: mexca.text.transcription

.. autoapi-nested-parse::

   Transcribe speech from audio to text.


Classes
-------

.. autoapisummary::

   mexca.text.transcription.AudioTranscriber


Functions
---------

.. autoapisummary::

   mexca.text.transcription.cli


Module Contents
---------------

.. py:class:: AudioTranscriber(whisper_model: Optional[str] = 'small', device: Optional[Union[str, torch.device]] = 'cpu', sentence_rule: Optional[str] = None)

   Transcribe speech from audio to text.

   :param whisper_model: The name of the whisper model that is used for transcription. Available models are
                         `['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large']`.
   :type whisper_model: str, optional, default='small'
   :param device: The name of the device onto which the whisper model should be loaded and run. If CUDA support is
                  available, this can be `'cuda'`, otherwise use `'cpu'` (the default).
   :type device: str or torch.device, optional, default='cpu'
   :param sentence_rule: A regular expression used to split segment transcripts into sentences. If `None` (default), it splits
                         the text at all '.', '?', '!', and ':' characters that are followed by whitespace characters. It
                         omits single or multiple words abbreviated with dots (e.g., 'Nr. ' and 'e.g. ').
   :type sentence_rule: str, optional


   .. py:property:: transcriber
      :type: whisper.Whisper


      The loaded whisper model for audio transcription.


   .. py:method:: apply(filepath: str, audio_annotation: mexca.data.SpeakerAnnotation, language: Optional[str] = None, options: Optional[whisper.DecodingOptions] = None, show_progress: bool = True) -> mexca.data.AudioTranscription

      Transcribe speech in an audio file to text.

      Transcribe each annotated speech segment in the audio file
      and split the transcription into sentences according to `sentence_rule`.

      :param filepath: Path to the audio file.
      :type filepath: str
      :param audio_annotation: The audio annotation object returned the `SpeakerIdentifier` component.
      :type audio_annotation: SpeakerAnnotation
      :param language: The language that is transcribed. Ignored if `options.language` is not `None`.
      :type language: str, optional, default=None
      :param options: Options for transcribing the audio file. If `None`, transcription is done without timestamps,
                      and with a number format that depends on whether CUDA is available:
                      FP16 (half-precision floating points) if available,
                      FP32 (single-precision floating points) otherwise.
      :type options: whisper.DecodingOptions, optional
      :param show_progress: Whether a progress bar is displayed or not.
      :type show_progress: bool, optional, default=True

      :returns: A data class object containing transcribed speech segments split into sentences.
      :rtype: AudioTranscription


   .. py:method:: get_default_options(language: Optional[str] = None) -> whisper.DecodingOptions
      :staticmethod:


      Set default options for transcription.

      Sets language as well as `without_timestamps=False` and `fp16=torch.cuda.is_available()`.

      :rtype: whisper.DecodingOptions


.. py:function:: cli()

   Command line interface for audio transcription.
   See `transcribe -h` for details.