mexca.data
==========

.. py:module:: mexca.data

.. autoapi-nested-parse::

   Objects for storing multimodal data.


Attributes
----------

.. autoapisummary::

   mexca.data.EMPTY_VALUE
   mexca.data.ProbFloat
   mexca.data.FloatToStr
   mexca.data.FloatOrNone


Classes
-------

.. autoapisummary::

   mexca.data.BaseData
   mexca.data.BaseFeatures
   mexca.data.BaseAnnotation
   mexca.data.VideoAnnotation
   mexca.data.VoiceFeaturesConfig
   mexca.data.VoiceFeatures
   mexca.data.SegmentData
   mexca.data.SpeakerAnnotation
   mexca.data.TranscriptionData
   mexca.data.AudioTranscription
   mexca.data.SentimentData
   mexca.data.SentimentAnnotation
   mexca.data.Multimodal


Module Contents
---------------

.. py:data:: EMPTY_VALUE
   :value: None


   Value that is returned if a feature is not present.

.. py:data:: ProbFloat

   Probability float type.

   Restricts the range to [0, 1].

.. py:data:: FloatToStr

   Convert floats or integers to strings.

   Type that converts a float or integer to a string.
   Returns `None` for other types than :class:`float`, :class:`int`, :class:`str`.

.. py:data:: FloatOrNone

   Convert nan float types to None types.

   Type that converts a float that is nan into a None type.
   Returns also None for None types.

.. py:class:: BaseData(**data: Any)


   Base class for storing segment data.


.. py:class:: BaseFeatures(**data: Any)


   Base class for storing features.

   .. attribute:: filename

      Path to the video file. Must be a valid path.

      :type: pydantic.FilePath


   .. py:method:: from_json(filename: str, extra_filename: Optional[str] = None, encoding: str = 'utf-8')
      :classmethod:


      Load data from a JSON file.

      :param filename: Name of the JSON file from which the object should be loaded.
                       Must have a .json ending.
      :type filename: str


   .. py:method:: write_json(filename: str, encoding: str = 'utf-8')

      Store data in a JSON file.

      :param filename: Name of the destination file. Must have a .json ending.
      :type filename: str


.. py:class:: BaseAnnotation(**data: Any)


   Base class for storing annotations.

   .. attribute:: filename

      Name of annotated file. Must be a valid path.

      :type: pydantic.FilePath

   .. attribute:: segments

      Interval tree containing :class:`intervaltree.Interval` annotation segments.
      Annotation data is stored in the :attr:`data` attribute of each :class:`intervaltree.Interval`.

      :type: intervaltree.IntervalTree, optional, default=None


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


   .. py:method:: from_json(filename: str, extra_filename: Optional[str] = None, encoding: str = 'utf-8')
      :classmethod:


      Load data from a JSON file.

      :param filename: Name of the JSON file from which the object should be loaded.
                       Must have a .json ending.
      :type filename: str


   .. py:method:: write_json(filename: str, encoding: str = 'utf-8')

      Store data in a JSON file.

      :param filename: Name of the destination file. Must have a .json ending.
      :type filename: str


.. py:class:: VideoAnnotation(**data: Any)


   Video annotation class for storing facial features.

   .. attribute:: frame

      Index of each frame. Must be non-negative and in ascending order.

      :type: typing.List[pydantic.NonNegativeInt], default=list()

   .. attribute:: time

      Timestamp of each frame in seconds. Must be non-negative and in ascending order.

      :type: typing.List[pydantic.NonNegativeFloat], default=list()

   .. attribute:: face_box

      Bounding box of a detected face. Is `None` if no face was detected.

      :type: typing.List[typing.Optional[typing.List[pydantic.NonNegativeFloat]]], optional, default=list()

   .. attribute:: face_prob

      Probability of a detected face. Is `None` if no face was detected.

      :type: typing.List[ProbFloat], optional, default=list()

   .. attribute:: face_landmarks

      Facial landmarks of a detected face. Is `None` if no face was detected.

      :type: typing.List[typing.Optional[typing.List[typing.List[pydantic.NonNegativeFloat]]], optional, default=list()

   .. attribute:: face_aus

      Facial action unit activations of a detected face. Is `None` if no face was detected.

      :type: typing.List[typing.Optional[typing.List[ProbFloat]]], optional, default=list()

   .. attribute:: face_label

      Label of a detected face. Is `None` if no face was detected.

      :type: typing.List[Float2Str], optional, default=list()

   .. attribute:: face_embeddings

      Embedding vector (list of 512 float elements) for each detected face in the input video.

      :type: typing.List[typing.Optional[typing.List[float]]], optional, default=list()

   .. attribute:: face_confidence

      Confidence of the `face_label` assignment. Is `None` if no face was detected or
      only one face label was assigned.

      :type: typing.List[ProbFloat], optional, default=list()

   .. attribute:: face_average_embeddings

      Average embedding vector (list of 512 float elements) for each face in the input video.

      :type: typing.Dict[Float2Str, typing.List[float]], optional, default=dict()


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


.. py:class:: VoiceFeaturesConfig(**data: Any)


   Configure the calculation of signal properties used for voice feature extraction.

   Create a pseudo-immutable object with attributes that are recognized by the
   :class:`VoiceExtractor` class and forwarded as arguments to signal property objects defined
   in :mod:`mexca.audio.features`. Details can be found in the feature class documentation.

   .. attribute:: frame_len

      Number of samples per frame.

      :type: pydantic.PositiveInt, default=1024

   .. attribute:: hop_len

      Number of samples between frame starting points.

      :type: pydantic.PositiveInt, default=256

   .. attribute:: center

      Whether the signal has been centered and padded before framing.

      :type: bool, default=True

   .. attribute:: pad_mode

      How the signal has been padded before framing. See :func:`numpy.pad`.
      Uses the default value 0 for `'constant'` padding.

      :type: str, default='constant'

   .. attribute:: spec_window

      The window that is applied before the STFT to obtain spectra.

      :type: _Window, default="hann"

   .. attribute:: pitch_lower_freq

      Lower limit used for pitch estimation (in Hz).

      :type: pydantic.NonNegativeFloat, default=75.0

   .. attribute:: pitch_upper_freq

      Upper limit used for pitch estimation (in Hz).

      :type: pydantic.NonNegativeFloat, default=600.0

   .. attribute:: pitch_method

      Method used for estimating voice pitch.

      :type: str, default="pyin"

   .. attribute:: ptich_n_harmonics

      Number of estimated pitch harmonics.

      :type: pydantic.PositiveInt, default=100

   .. attribute:: pitch_pulse_lower_period

      Lower limit for periods between glottal pulses for jitter and shimmer extraction.

      :type: pydantic.PositiveFloat, default=0.0001

   .. attribute:: pitch_pulse_upper_period

      Upper limit for periods between glottal pulses for jitter and shimmer extraction.

      :type: pydantic.PositiveFloat, default=0.02

   .. attribute:: pitch_pulse_max_period_ratio

      Maximum ratio between consecutive glottal periods for jitter and shimmer extraction.

      :type: pydantic.PositiveFloat, default=1.3

   .. attribute:: pitch_pulse_max_amp_factor

      Maximum ratio between consecutive amplitudes used for shimmer extraction.

      :type: pydantic.PositiveFloat, default=1.6

   .. attribute:: jitter_rel

      Divide jitter by the average pitch period.

      :type: bool, default=True

   .. attribute:: shimmer_rel

      Divide shimmer by the average pulse amplitude.

      :type: bool, default=True

   .. attribute:: hnr_lower_freq

      Lower fundamental frequency limit for choosing pitch candidates when computing the harmonics-to-noise ratio (HNR).

      :type: pydantic.PositiveFloat, default = 75.0

   .. attribute:: hnr_rel_silence_threshold

      Relative threshold for treating signal frames as silent when computing the HNR.

      :type: pydantic.PositiveFloat, default = 0.1

   .. attribute:: formants_max

      The maximum number of formants that are extracted.

      :type: pydantic.PositiveInt, default=5

   .. attribute:: formants_lower_freq

      Lower limit for formant frequencies (in Hz).

      :type: pydantic.NonNegativeFloat, default=50.0

   .. attribute:: formants_upper_freq

      Upper limit for formant frequencies (in Hz).

      :type: pydantic.NonNegativeFloat, default=5450.0

   .. attribute:: formants_signal_preemphasis_from

      Starting value for the applied preemphasis function (in Hz).

      :type: pydantic.NonNegativeFloat, optional, default=50.0

   .. attribute:: formants_window

      Window function that is applied before formant estimation.

      :type: _Window, default="praat_gaussian"

   .. attribute:: formants_amp_lower

      Lower boundary for formant peak amplitude search interval.

      :type: pydantic.PositiveFloat, optional, default=0.8

   .. attribute:: formants_amp_upper

      Upper boundary for formant peak amplitude search interval.

      :type: pydantic.PositiveFloat, optional, default=1.2

   .. attribute:: formants_amp_rel_f0

      Whether the formant amplitude is divided by the fundamental frequency amplitude.

      :type: bool, optional, default=True

   .. attribute:: alpha_ratio_lower_band

      Boundaries of the alpha ratio lower frequency band (start, end) in Hz.

      :type: tuple, default=(50.0, 1000.0)

   .. attribute:: alpha_ratio_upper_band

      Boundaries of the alpha ratio upper frequency band (start, end) in Hz.

      :type: tuple, default=(1000.0, 5000.0)

   .. attribute:: hammar_index_pivot_point_freq

      Point separating the Hammarberg index lower and upper frequency regions in Hz.

      :type: pydantic.PositiveFloat, default=2000.0

   .. attribute:: hammar_index_upper_freq

      Upper limit for the Hammarberg index upper frequency region in Hz.

      :type: pydantic.PositiveFloat, default=5000.0

   .. attribute:: spectral_slopes_bands

      Frequency bands in Hz for which spectral slopes are estimated.

      :type: tuple, default=((0.0, 500.0), (500.0, 1500.0))

   .. attribute:: mel_spec_n_mels

      Number of Mel filters.

      :type: pydantic.PositiveInt, default=26

   .. attribute:: mel_spec_lower_freq

      Lower frequency boundary for Mel spectogram transformation in Hz.

      :type: pydantic.NonNegativeFloat, default=20.0

   .. attribute:: mel_spec_upper_freq

      Upper frequency boundary for Mel spectogram transformation in Hz.

      :type: pydantic.NonNegativeFloat, default=8000.0

   .. attribute:: mfcc_n

      Number of Mel frequency cepstral coefficients (MFCCs) that are estimated per frame.

      :type: pydantic.PositiveInt, default=4

   .. attribute:: mfcc_lifter

      Cepstral liftering coefficient for MFCC estimation. Must be >= 0. If zero, no liftering is applied.

      :type: pydantic.NonNegativeFloat, default=22.0


   .. py:method:: from_yaml(filename: str)
      :classmethod:


      Load a voice configuration object from a YAML file.

      Uses safe YAML loading (only supports native YAML but no Python tags).
      Converts loaded YAML sequences to tuples.

      :param filename: Path to the YAML file. Must have a .yml or .yaml ending.
      :type filename: str


   .. py:method:: write_yaml(filename: str)

      Write a voice configuration object to a YAML file.

      Uses safe YAML dumping (only supports native YAML but no Python tags).

      :param filename: Path to the YAML file. Must have a .yml or .yaml ending.
      :type filename: str


.. py:class:: VoiceFeatures(**data: Any)


   Class for storing voice features.

   Features are stored as lists (like columns of a data frame).
   Optional features are initialized as empty lists.

   .. attribute:: frame

      The frame index for which features were extracted. Must be non-negative and in ascending order.

      :type: typing.List[pydantic.NonNegativeInt]

   .. attribute:: time

      The time stamp at which features were extracted. Must be non-negative and in ascending order.

      :type: typing.List[pydantic.NonNegativeFloat]


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


   .. py:method:: from_json(filename: str, extra_filename: Optional[str] = None, encoding: str = 'utf-8')
      :classmethod:


      Load data from a JSON file.

      :param filename: Name of the JSON file from which the object should be loaded.
                       Must have a .json ending.
      :type filename: str


.. py:class:: SegmentData(**data: Any)


   Class for storing speech segment data.

   .. attribute:: name

      Speaker label.

      :type: str

   .. attribute:: conf

      Confidence of speaker label.

      :type: ProbFloat, optional, default=None


.. py:class:: SpeakerAnnotation(**data: Any)


   Class for storing speaker and speech segment annotations.

   .. attribute:: filename

      Name of the annotated audio file. Must be a valid path.

      :type: pydantic.FilePath

   .. attribute:: channel

      Channel index.

      :type: int, optional

   .. attribute:: speaker_average_embeddings

      Average embedding vector for each speaker label.

      :type: typing.Dict[FloatToStr, List[float]], optional

   .. attribute:: segments

      Stores speech segments as :class:`intervaltree.Interval`.
      Speaker labels are stored in :class:`SegmentData` objects in the :class:`data` attribute of each interval.

      :type: intervaltree.IntervalTree, optional


   .. py:method:: from_pyannote(annotation: pyannote.core.Annotation, embeddings: Optional[Dict[str, List[float]]] = None)
      :classmethod:


      Create a :class:`SpeakerAnnotation` object from a :class:`pyannote.core.Annotation` object.

      :param annotation: Annotation object containing speech segments and speaker labels.
      :type annotation: pyannote.core.Annotation


   .. py:method:: from_rttm(filename: str, extra_filename: Optional[str] = None)
      :classmethod:


      Load a speaker annotation from an RTTM file.

      :param filename: Path to the file. Must have an RTTM ending.
      :type filename: str


   .. py:method:: write_rttm(filename: str)

      Write a speaker annotation to an RTTM file.

      :param filename: Path to the file. Must have an RTTM ending.
      :type filename: str


.. py:class:: TranscriptionData(**data: Any)


   Class for storing transcription data.

   .. attribute:: index

      Index of the transcribed sentence.

      :type: int

   .. attribute:: text

      Transcribed text.

      :type: str

   .. attribute:: speaker

      Speaker of the transcribed text.

      :type: str, optional, default=None

   .. attribute:: confidence

      Average word probability of transcribed text.

      :type: ProbFloat, optional, default=None


.. py:class:: AudioTranscription(**data: Any)


   Class for storing audio transcriptions.

   .. attribute:: filename

      Name of the transcribed audio file. Must be a valid path.

      :type: pydantic.FilePath

   .. attribute:: segments

      Interval tree containing the transcribed speech segments split into sentences as intervals.
      The transcribed sentences are stored in the `data` attribute of each interval.

      :type: intervaltree.IntervalTree, optional, default=None


   .. py:property:: subtitles

      Deprecated alias for `segments`.


   .. py:method:: from_srt(filename: str, extra_filename: Optional[str] = None)
      :classmethod:


      Load an audio transcription from an SRT file.

      :param filename: Name of the file to be loaded. Must have an .srt ending.
      :type filename: str


   .. py:method:: write_srt(filename: str)

      Write an audio transcription to an SRT file

      :param filename: Name of the file to write to. Must have an .srt ending.
      :type filename: str


.. py:class:: SentimentData(**data: Any)


   Class for storing sentiment data.

   .. attribute:: text

      Text of the sentence for which sentiment scores were predicted.

      :type: str

   .. attribute:: pos

      Positive sentiment score.

      :type: ProbFloat

   .. attribute:: neg

      Negative sentiment score.

      :type: ProbFloat

   .. attribute:: neu

      Neutral sentiment score.

      :type: ProbFloat


.. py:class:: SentimentAnnotation(**data: Any)


   Class for storing sentiment scores of transcribed sentences.

   Stores sentiment scores as intervals in an interval tree. The scores are stored in the `data` attribute of each interval.

   .. attribute:: filename

      Name of the file from which sentiment was extracted. Must be a valid path.

      :type: pydantic.FilePath


.. py:class:: Multimodal(**data: Any)


   Class for storing multimodal features.

   See the :ref:`Output` section for details.

   .. attribute:: filename

      Name of the video file. Must be a valid path.

      :type: pydantic.FilePath

   .. attribute:: duration

      Video duration in seconds.

      :type: pydantic.NonNegativeFloat, optional, default=None

   .. attribute:: fps

      Frames per second.

      :type: pydantic.PositiveFloat

   .. attribute:: fps_adjusted

      Frames per seconds adjusted for skipped frames.
      Mostly needed for internal computations.

      :type: pydantic.PositiveFloat

   .. attribute:: video_annotation

      Object containing facial features.

      :type: VideoAnnotation

   .. attribute:: audio_annotation

      Object containing speech segments and speakers.

      :type: SpeakerAnnotation

   .. attribute:: voice_features

      Object containing voice features.

      :type: VoiceFeatures

   .. attribute:: transcription

      Object containing transcribed speech segments split into sentences.

      :type: AudioTranscription

   .. attribute:: sentiment

      Object containing sentiment scores for transcribed sentences.

      :type: SentimentAnnotation

   .. attribute:: features

      Merged features stored in a :class:`polars.LazyFrame` object that uses lazy evaluation. To trigger evaluation
      the :func:`collect` method can be called.

      :type: polars.LazyFrame


   .. py:attribute:: model_config

      Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict].


   .. py:method:: merge_features() -> polars.LazyFrame

      Merge multimodal features from pipeline components into a common data frame.

      Transforms and merges the available output stored in the `Multimodal` object
      based on the `'frame'` variable. Stores the merged features as a `pandas.DataFrame`
      in the `features` attribute.

      :returns: Merged multimodal features.
      :rtype: pandas.DataFrame