mexca.data ========== .. py:module:: mexca.data .. autoapi-nested-parse:: Objects for storing multimodal data. Attributes ---------- .. autoapisummary:: mexca.data.EMPTY_VALUE mexca.data.ProbFloat mexca.data.FloatToStr mexca.data.FloatOrNone Classes ------- .. autoapisummary:: mexca.data.BaseData mexca.data.BaseFeatures mexca.data.BaseAnnotation mexca.data.VideoAnnotation mexca.data.VoiceFeaturesConfig mexca.data.VoiceFeatures mexca.data.SegmentData mexca.data.SpeakerAnnotation mexca.data.TranscriptionData mexca.data.AudioTranscription mexca.data.SentimentData mexca.data.SentimentAnnotation mexca.data.Multimodal Module Contents --------------- .. py:data:: EMPTY_VALUE :value: None Value that is returned if a feature is not present. .. py:data:: ProbFloat Probability float type. Restricts the range to [0, 1]. .. py:data:: FloatToStr Convert floats or integers to strings. Type that converts a float or integer to a string. Returns `None` for other types than :class:`float`, :class:`int`, :class:`str`. .. py:data:: FloatOrNone Convert nan float types to None types. Type that converts a float that is nan into a None type. Returns also None for None types. .. py:class:: BaseData(**data: Any) Base class for storing segment data. .. py:class:: BaseFeatures(**data: Any) Base class for storing features. .. attribute:: filename Path to the video file. Must be a valid path. :type: pydantic.FilePath .. py:method:: from_json(filename: str, extra_filename: Optional[str] = None, encoding: str = 'utf-8') :classmethod: Load data from a JSON file. :param filename: Name of the JSON file from which the object should be loaded. Must have a .json ending. :type filename: str .. py:method:: write_json(filename: str, encoding: str = 'utf-8') Store data in a JSON file. :param filename: Name of the destination file. Must have a .json ending. :type filename: str .. py:class:: BaseAnnotation(**data: Any) Base class for storing annotations. .. attribute:: filename Name of annotated file. Must be a valid path. :type: pydantic.FilePath .. attribute:: segments Interval tree containing :class:`intervaltree.Interval` annotation segments. Annotation data is stored in the :attr:`data` attribute of each :class:`intervaltree.Interval`. :type: intervaltree.IntervalTree, optional, default=None .. py:attribute:: model_config Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict]. .. py:method:: from_json(filename: str, extra_filename: Optional[str] = None, encoding: str = 'utf-8') :classmethod: Load data from a JSON file. :param filename: Name of the JSON file from which the object should be loaded. Must have a .json ending. :type filename: str .. py:method:: write_json(filename: str, encoding: str = 'utf-8') Store data in a JSON file. :param filename: Name of the destination file. Must have a .json ending. :type filename: str .. py:class:: VideoAnnotation(**data: Any) Video annotation class for storing facial features. .. attribute:: frame Index of each frame. Must be non-negative and in ascending order. :type: typing.List[pydantic.NonNegativeInt], default=list() .. attribute:: time Timestamp of each frame in seconds. Must be non-negative and in ascending order. :type: typing.List[pydantic.NonNegativeFloat], default=list() .. attribute:: face_box Bounding box of a detected face. Is `None` if no face was detected. :type: typing.List[typing.Optional[typing.List[pydantic.NonNegativeFloat]]], optional, default=list() .. attribute:: face_prob Probability of a detected face. Is `None` if no face was detected. :type: typing.List[ProbFloat], optional, default=list() .. attribute:: face_landmarks Facial landmarks of a detected face. Is `None` if no face was detected. :type: typing.List[typing.Optional[typing.List[typing.List[pydantic.NonNegativeFloat]]], optional, default=list() .. attribute:: face_aus Facial action unit activations of a detected face. Is `None` if no face was detected. :type: typing.List[typing.Optional[typing.List[ProbFloat]]], optional, default=list() .. attribute:: face_label Label of a detected face. Is `None` if no face was detected. :type: typing.List[Float2Str], optional, default=list() .. attribute:: face_embeddings Embedding vector (list of 512 float elements) for each detected face in the input video. :type: typing.List[typing.Optional[typing.List[float]]], optional, default=list() .. attribute:: face_confidence Confidence of the `face_label` assignment. Is `None` if no face was detected or only one face label was assigned. :type: typing.List[ProbFloat], optional, default=list() .. attribute:: face_average_embeddings Average embedding vector (list of 512 float elements) for each face in the input video. :type: typing.Dict[Float2Str, typing.List[float]], optional, default=dict() .. py:attribute:: model_config Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict]. .. py:class:: VoiceFeaturesConfig(**data: Any) Configure the calculation of signal properties used for voice feature extraction. Create a pseudo-immutable object with attributes that are recognized by the :class:`VoiceExtractor` class and forwarded as arguments to signal property objects defined in :mod:`mexca.audio.features`. Details can be found in the feature class documentation. .. attribute:: frame_len Number of samples per frame. :type: pydantic.PositiveInt, default=1024 .. attribute:: hop_len Number of samples between frame starting points. :type: pydantic.PositiveInt, default=256 .. attribute:: center Whether the signal has been centered and padded before framing. :type: bool, default=True .. attribute:: pad_mode How the signal has been padded before framing. See :func:`numpy.pad`. Uses the default value 0 for `'constant'` padding. :type: str, default='constant' .. attribute:: spec_window The window that is applied before the STFT to obtain spectra. :type: _Window, default="hann" .. attribute:: pitch_lower_freq Lower limit used for pitch estimation (in Hz). :type: pydantic.NonNegativeFloat, default=75.0 .. attribute:: pitch_upper_freq Upper limit used for pitch estimation (in Hz). :type: pydantic.NonNegativeFloat, default=600.0 .. attribute:: pitch_method Method used for estimating voice pitch. :type: str, default="pyin" .. attribute:: ptich_n_harmonics Number of estimated pitch harmonics. :type: pydantic.PositiveInt, default=100 .. attribute:: pitch_pulse_lower_period Lower limit for periods between glottal pulses for jitter and shimmer extraction. :type: pydantic.PositiveFloat, default=0.0001 .. attribute:: pitch_pulse_upper_period Upper limit for periods between glottal pulses for jitter and shimmer extraction. :type: pydantic.PositiveFloat, default=0.02 .. attribute:: pitch_pulse_max_period_ratio Maximum ratio between consecutive glottal periods for jitter and shimmer extraction. :type: pydantic.PositiveFloat, default=1.3 .. attribute:: pitch_pulse_max_amp_factor Maximum ratio between consecutive amplitudes used for shimmer extraction. :type: pydantic.PositiveFloat, default=1.6 .. attribute:: jitter_rel Divide jitter by the average pitch period. :type: bool, default=True .. attribute:: shimmer_rel Divide shimmer by the average pulse amplitude. :type: bool, default=True .. attribute:: hnr_lower_freq Lower fundamental frequency limit for choosing pitch candidates when computing the harmonics-to-noise ratio (HNR). :type: pydantic.PositiveFloat, default = 75.0 .. attribute:: hnr_rel_silence_threshold Relative threshold for treating signal frames as silent when computing the HNR. :type: pydantic.PositiveFloat, default = 0.1 .. attribute:: formants_max The maximum number of formants that are extracted. :type: pydantic.PositiveInt, default=5 .. attribute:: formants_lower_freq Lower limit for formant frequencies (in Hz). :type: pydantic.NonNegativeFloat, default=50.0 .. attribute:: formants_upper_freq Upper limit for formant frequencies (in Hz). :type: pydantic.NonNegativeFloat, default=5450.0 .. attribute:: formants_signal_preemphasis_from Starting value for the applied preemphasis function (in Hz). :type: pydantic.NonNegativeFloat, optional, default=50.0 .. attribute:: formants_window Window function that is applied before formant estimation. :type: _Window, default="praat_gaussian" .. attribute:: formants_amp_lower Lower boundary for formant peak amplitude search interval. :type: pydantic.PositiveFloat, optional, default=0.8 .. attribute:: formants_amp_upper Upper boundary for formant peak amplitude search interval. :type: pydantic.PositiveFloat, optional, default=1.2 .. attribute:: formants_amp_rel_f0 Whether the formant amplitude is divided by the fundamental frequency amplitude. :type: bool, optional, default=True .. attribute:: alpha_ratio_lower_band Boundaries of the alpha ratio lower frequency band (start, end) in Hz. :type: tuple, default=(50.0, 1000.0) .. attribute:: alpha_ratio_upper_band Boundaries of the alpha ratio upper frequency band (start, end) in Hz. :type: tuple, default=(1000.0, 5000.0) .. attribute:: hammar_index_pivot_point_freq Point separating the Hammarberg index lower and upper frequency regions in Hz. :type: pydantic.PositiveFloat, default=2000.0 .. attribute:: hammar_index_upper_freq Upper limit for the Hammarberg index upper frequency region in Hz. :type: pydantic.PositiveFloat, default=5000.0 .. attribute:: spectral_slopes_bands Frequency bands in Hz for which spectral slopes are estimated. :type: tuple, default=((0.0, 500.0), (500.0, 1500.0)) .. attribute:: mel_spec_n_mels Number of Mel filters. :type: pydantic.PositiveInt, default=26 .. attribute:: mel_spec_lower_freq Lower frequency boundary for Mel spectogram transformation in Hz. :type: pydantic.NonNegativeFloat, default=20.0 .. attribute:: mel_spec_upper_freq Upper frequency boundary for Mel spectogram transformation in Hz. :type: pydantic.NonNegativeFloat, default=8000.0 .. attribute:: mfcc_n Number of Mel frequency cepstral coefficients (MFCCs) that are estimated per frame. :type: pydantic.PositiveInt, default=4 .. attribute:: mfcc_lifter Cepstral liftering coefficient for MFCC estimation. Must be >= 0. If zero, no liftering is applied. :type: pydantic.NonNegativeFloat, default=22.0 .. py:method:: from_yaml(filename: str) :classmethod: Load a voice configuration object from a YAML file. Uses safe YAML loading (only supports native YAML but no Python tags). Converts loaded YAML sequences to tuples. :param filename: Path to the YAML file. Must have a .yml or .yaml ending. :type filename: str .. py:method:: write_yaml(filename: str) Write a voice configuration object to a YAML file. Uses safe YAML dumping (only supports native YAML but no Python tags). :param filename: Path to the YAML file. Must have a .yml or .yaml ending. :type filename: str .. py:class:: VoiceFeatures(**data: Any) Class for storing voice features. Features are stored as lists (like columns of a data frame). Optional features are initialized as empty lists. .. attribute:: frame The frame index for which features were extracted. Must be non-negative and in ascending order. :type: typing.List[pydantic.NonNegativeInt] .. attribute:: time The time stamp at which features were extracted. Must be non-negative and in ascending order. :type: typing.List[pydantic.NonNegativeFloat] .. py:attribute:: model_config Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict]. .. py:method:: from_json(filename: str, extra_filename: Optional[str] = None, encoding: str = 'utf-8') :classmethod: Load data from a JSON file. :param filename: Name of the JSON file from which the object should be loaded. Must have a .json ending. :type filename: str .. py:class:: SegmentData(**data: Any) Class for storing speech segment data. .. attribute:: name Speaker label. :type: str .. attribute:: conf Confidence of speaker label. :type: ProbFloat, optional, default=None .. py:class:: SpeakerAnnotation(**data: Any) Class for storing speaker and speech segment annotations. .. attribute:: filename Name of the annotated audio file. Must be a valid path. :type: pydantic.FilePath .. attribute:: channel Channel index. :type: int, optional .. attribute:: speaker_average_embeddings Average embedding vector for each speaker label. :type: typing.Dict[FloatToStr, List[float]], optional .. attribute:: segments Stores speech segments as :class:`intervaltree.Interval`. Speaker labels are stored in :class:`SegmentData` objects in the :class:`data` attribute of each interval. :type: intervaltree.IntervalTree, optional .. py:method:: from_pyannote(annotation: pyannote.core.Annotation, embeddings: Optional[Dict[str, List[float]]] = None) :classmethod: Create a :class:`SpeakerAnnotation` object from a :class:`pyannote.core.Annotation` object. :param annotation: Annotation object containing speech segments and speaker labels. :type annotation: pyannote.core.Annotation .. py:method:: from_rttm(filename: str, extra_filename: Optional[str] = None) :classmethod: Load a speaker annotation from an RTTM file. :param filename: Path to the file. Must have an RTTM ending. :type filename: str .. py:method:: write_rttm(filename: str) Write a speaker annotation to an RTTM file. :param filename: Path to the file. Must have an RTTM ending. :type filename: str .. py:class:: TranscriptionData(**data: Any) Class for storing transcription data. .. attribute:: index Index of the transcribed sentence. :type: int .. attribute:: text Transcribed text. :type: str .. attribute:: speaker Speaker of the transcribed text. :type: str, optional, default=None .. attribute:: confidence Average word probability of transcribed text. :type: ProbFloat, optional, default=None .. py:class:: AudioTranscription(**data: Any) Class for storing audio transcriptions. .. attribute:: filename Name of the transcribed audio file. Must be a valid path. :type: pydantic.FilePath .. attribute:: segments Interval tree containing the transcribed speech segments split into sentences as intervals. The transcribed sentences are stored in the `data` attribute of each interval. :type: intervaltree.IntervalTree, optional, default=None .. py:property:: subtitles Deprecated alias for `segments`. .. py:method:: from_srt(filename: str, extra_filename: Optional[str] = None) :classmethod: Load an audio transcription from an SRT file. :param filename: Name of the file to be loaded. Must have an .srt ending. :type filename: str .. py:method:: write_srt(filename: str) Write an audio transcription to an SRT file :param filename: Name of the file to write to. Must have an .srt ending. :type filename: str .. py:class:: SentimentData(**data: Any) Class for storing sentiment data. .. attribute:: text Text of the sentence for which sentiment scores were predicted. :type: str .. attribute:: pos Positive sentiment score. :type: ProbFloat .. attribute:: neg Negative sentiment score. :type: ProbFloat .. attribute:: neu Neutral sentiment score. :type: ProbFloat .. py:class:: SentimentAnnotation(**data: Any) Class for storing sentiment scores of transcribed sentences. Stores sentiment scores as intervals in an interval tree. The scores are stored in the `data` attribute of each interval. .. attribute:: filename Name of the file from which sentiment was extracted. Must be a valid path. :type: pydantic.FilePath .. py:class:: Multimodal(**data: Any) Class for storing multimodal features. See the :ref:`Output` section for details. .. attribute:: filename Name of the video file. Must be a valid path. :type: pydantic.FilePath .. attribute:: duration Video duration in seconds. :type: pydantic.NonNegativeFloat, optional, default=None .. attribute:: fps Frames per second. :type: pydantic.PositiveFloat .. attribute:: fps_adjusted Frames per seconds adjusted for skipped frames. Mostly needed for internal computations. :type: pydantic.PositiveFloat .. attribute:: video_annotation Object containing facial features. :type: VideoAnnotation .. attribute:: audio_annotation Object containing speech segments and speakers. :type: SpeakerAnnotation .. attribute:: voice_features Object containing voice features. :type: VoiceFeatures .. attribute:: transcription Object containing transcribed speech segments split into sentences. :type: AudioTranscription .. attribute:: sentiment Object containing sentiment scores for transcribed sentences. :type: SentimentAnnotation .. attribute:: features Merged features stored in a :class:`polars.LazyFrame` object that uses lazy evaluation. To trigger evaluation the :func:`collect` method can be called. :type: polars.LazyFrame .. py:attribute:: model_config Configuration for the model, should be a dictionary conforming to [`ConfigDict`][pydantic.config.ConfigDict]. .. py:method:: merge_features() -> polars.LazyFrame Merge multimodal features from pipeline components into a common data frame. Transforms and merges the available output stored in the `Multimodal` object based on the `'frame'` variable. Stores the merged features as a `pandas.DataFrame` in the `features` attribute. :returns: Merged multimodal features. :rtype: pandas.DataFrame