mexca.data

Objects for storing multimodal data.

Module Contents

Classes

BaseData

Base class for storing segment data.

BaseFeatures

Base class for storing features.

BaseAnnotation

Base class for storing annotations.

VideoAnnotation

Video annotation class for storing facial features.

VoiceFeaturesConfig

Configure the calculation of signal properties used for voice feature extraction.

VoiceFeatures

Class for storing voice features.

SegmentData

Class for storing speech segment data.

SpeakerAnnotation

Class for storing speaker and speech segment annotations.

TranscriptionData

Class for storing transcription data.

AudioTranscription

Class for storing audio transcriptions.

SentimentData

Class for storing sentiment data.

SentimentAnnotation

Class for storing sentiment scores of transcribed sentences.

Multimodal

Class for storing multimodal features.

Attributes

EMPTY_VALUE

Value that is returned if a feature is not present.

ProbFloat

Probability float type.

FloatToStr

Convert floats or integers to strings.

FloatOrNone

Convert nan float types to None types.

mexca.data.EMPTY_VALUE[source]

Value that is returned if a feature is not present.

mexca.data.ProbFloat[source]

Probability float type.

Restricts the range to [0, 1].

mexca.data.FloatToStr[source]

Convert floats or integers to strings.

Type that converts a float or integer to a string. Returns None for other types than float, int, str.

mexca.data.FloatOrNone[source]

Convert nan float types to None types.

Type that converts a float that is nan into a None type. Returns also None for None types.

class mexca.data.BaseData(**data: Any)[source]

Base class for storing segment data.

class mexca.data.BaseFeatures(**data: Any)[source]

Base class for storing features.

filename[source]

Path to the video file. Must be a valid path.

Type:

pydantic.FilePath

__eq__(other: BaseFeatures) bool[source]

Return self==value.

classmethod from_json(filename: str, extra_filename: str | None = None, encoding: str = 'utf-8')[source]

Load data from a JSON file.

Parameters:

filename (str) – Name of the JSON file from which the object should be loaded. Must have a .json ending.

write_json(filename: str, encoding: str = 'utf-8')[source]

Store data in a JSON file.

Parameters:

filename (str) – Name of the destination file. Must have a .json ending.

class mexca.data.BaseAnnotation(**data: Any)[source]

Base class for storing annotations.

filename[source]

Name of annotated file. Must be a valid path.

Type:

pydantic.FilePath

segments[source]

Interval tree containing intervaltree.Interval annotation segments. Annotation data is stored in the data attribute of each intervaltree.Interval.

Type:

intervaltree.IntervalTree, optional, default=None

classmethod from_json(filename: str, extra_filename: str | None = None, encoding: str = 'utf-8')[source]

Load data from a JSON file.

Parameters:

filename (str) – Name of the JSON file from which the object should be loaded. Must have a .json ending.

write_json(filename: str, encoding: str = 'utf-8')[source]

Store data in a JSON file.

Parameters:

filename (str) – Name of the destination file. Must have a .json ending.

class mexca.data.VideoAnnotation(**data: Any)[source]

Video annotation class for storing facial features.

frame[source]

Index of each frame. Must be non-negative and in ascending order.

Type:

List[pydantic.NonNegativeInt], default=list()

time[source]

Timestamp of each frame in seconds. Must be non-negative and in ascending order.

Type:

List[pydantic.NonNegativeFloat], default=list()

face_box[source]

Bounding box of a detected face. Is None if no face was detected.

Type:

List[Optional[List[pydantic.NonNegativeFloat]]], optional, default=list()

face_prob[source]

Probability of a detected face. Is None if no face was detected.

Type:

List[ProbFloat], optional, default=list()

face_landmarks[source]

Facial landmarks of a detected face. Is None if no face was detected.

Type:

List[Optional[List[List[pydantic.NonNegativeFloat]]], optional, default=list()

face_aus[source]

Facial action unit activations of a detected face. Is None if no face was detected.

Type:

List[Optional[List[ProbFloat]]], optional, default=list()

face_label[source]

Label of a detected face. Is None if no face was detected.

Type:

List[Float2Str], optional, default=list()

face_embeddings[source]

Embedding vector (list of 512 float elements) for each detected face in the input video.

Type:

List[Optional[List[float]]], optional, default=list()

face_confidence[source]

Confidence of the face_label assignment. Is None if no face was detected or only one face label was assigned.

Type:

List[ProbFloat], optional, default=list()

face_average_embeddings[source]

Average embedding vector (list of 512 float elements) for each face in the input video.

Type:

Dict[Float2Str, List[float]], optional, default=dict()

class mexca.data.VoiceFeaturesConfig(**data: Any)[source]

Configure the calculation of signal properties used for voice feature extraction.

Create a pseudo-immutable object with attributes that are recognized by the VoiceExtractor class and forwarded as arguments to signal property objects defined in mexca.audio.features. Details can be found in the feature class documentation.

frame_len[source]

Number of samples per frame.

Type:

pydantic.PositiveInt, default=1024

hop_len[source]

Number of samples between frame starting points.

Type:

pydantic.PositiveInt, default=256

center[source]

Whether the signal has been centered and padded before framing.

Type:

bool, default=True

pad_mode[source]

How the signal has been padded before framing. See numpy.pad(). Uses the default value 0 for ‘constant’ padding.

Type:

str, default=’constant’

spec_window[source]

The window that is applied before the STFT to obtain spectra.

Type:

_Window, default=”hann”

pitch_lower_freq[source]

Lower limit used for pitch estimation (in Hz).

Type:

pydantic.NonNegativeFloat, default=75.0

pitch_upper_freq[source]

Upper limit used for pitch estimation (in Hz).

Type:

pydantic.NonNegativeFloat, default=600.0

pitch_method[source]

Method used for estimating voice pitch.

Type:

str, default=”pyin”

ptich_n_harmonics

Number of estimated pitch harmonics.

Type:

pydantic.PositiveInt, default=100

pitch_pulse_lower_period[source]

Lower limit for periods between glottal pulses for jitter and shimmer extraction.

Type:

pydantic.PositiveFloat, default=0.0001

pitch_pulse_upper_period[source]

Upper limit for periods between glottal pulses for jitter and shimmer extraction.

Type:

pydantic.PositiveFloat, default=0.02

pitch_pulse_max_period_ratio[source]

Maximum ratio between consecutive glottal periods for jitter and shimmer extraction.

Type:

pydantic.PositiveFloat, default=1.3

pitch_pulse_max_amp_factor[source]

Maximum ratio between consecutive amplitudes used for shimmer extraction.

Type:

pydantic.PositiveFloat, default=1.6

jitter_rel[source]

Divide jitter by the average pitch period.

Type:

bool, default=True

shimmer_rel[source]

Divide shimmer by the average pulse amplitude.

Type:

bool, default=True

hnr_lower_freq[source]

Lower fundamental frequency limit for choosing pitch candidates when computing the harmonics-to-noise ratio (HNR).

Type:

pydantic.PositiveFloat, default = 75.0

hnr_rel_silence_threshold[source]

Relative threshold for treating signal frames as silent when computing the HNR.

Type:

pydantic.PositiveFloat, default = 0.1

formants_max[source]

The maximum number of formants that are extracted.

Type:

pydantic.PositiveInt, default=5

formants_lower_freq[source]

Lower limit for formant frequencies (in Hz).

Type:

pydantic.NonNegativeFloat, default=50.0

formants_upper_freq[source]

Upper limit for formant frequencies (in Hz).

Type:

pydantic.NonNegativeFloat, default=5450.0

formants_signal_preemphasis_from[source]

Starting value for the applied preemphasis function (in Hz).

Type:

pydantic.NonNegativeFloat, optional, default=50.0

formants_window[source]

Window function that is applied before formant estimation.

Type:

_Window, default=”praat_gaussian”

formants_amp_lower[source]

Lower boundary for formant peak amplitude search interval.

Type:

pydantic.PositiveFloat, optional, default=0.8

formants_amp_upper[source]

Upper boundary for formant peak amplitude search interval.

Type:

pydantic.PositiveFloat, optional, default=1.2

formants_amp_rel_f0[source]

Whether the formant amplitude is divided by the fundamental frequency amplitude.

Type:

bool, optional, default=True

alpha_ratio_lower_band[source]

Boundaries of the alpha ratio lower frequency band (start, end) in Hz.

Type:

tuple, default=(50.0, 1000.0)

alpha_ratio_upper_band[source]

Boundaries of the alpha ratio upper frequency band (start, end) in Hz.

Type:

tuple, default=(1000.0, 5000.0)

hammar_index_pivot_point_freq[source]

Point separating the Hammarberg index lower and upper frequency regions in Hz.

Type:

pydantic.PositiveFloat, default=2000.0

hammar_index_upper_freq[source]

Upper limit for the Hammarberg index upper frequency region in Hz.

Type:

pydantic.PositiveFloat, default=5000.0

spectral_slopes_bands[source]

Frequency bands in Hz for which spectral slopes are estimated.

Type:

tuple, default=((0.0, 500.0), (500.0, 1500.0))

mel_spec_n_mels[source]

Number of Mel filters.

Type:

pydantic.PositiveInt, default=26

mel_spec_lower_freq[source]

Lower frequency boundary for Mel spectogram transformation in Hz.

Type:

pydantic.NonNegativeFloat, default=20.0

mel_spec_upper_freq[source]

Upper frequency boundary for Mel spectogram transformation in Hz.

Type:

pydantic.NonNegativeFloat, default=8000.0

mfcc_n[source]

Number of Mel frequency cepstral coefficients (MFCCs) that are estimated per frame.

Type:

pydantic.PositiveInt, default=4

mfcc_lifter[source]

Cepstral liftering coefficient for MFCC estimation. Must be >= 0. If zero, no liftering is applied.

Type:

pydantic.NonNegativeFloat, default=22.0

classmethod from_yaml(filename: str)[source]

Load a voice configuration object from a YAML file.

Uses safe YAML loading (only supports native YAML but no Python tags). Converts loaded YAML sequences to tuples.

Parameters:

filename (str) – Path to the YAML file. Must have a .yml or .yaml ending.

write_yaml(filename: str)[source]

Write a voice configuration object to a YAML file.

Uses safe YAML dumping (only supports native YAML but no Python tags).

Parameters:

filename (str) – Path to the YAML file. Must have a .yml or .yaml ending.

class mexca.data.VoiceFeatures(**data: Any)[source]

Class for storing voice features.

Features are stored as lists (like columns of a data frame). Optional features are initialized as empty lists.

frame[source]

The frame index for which features were extracted. Must be non-negative and in ascending order.

Type:

List[pydantic.NonNegativeInt]

time[source]

The time stamp at which features were extracted. Must be non-negative and in ascending order.

Type:

List[pydantic.NonNegativeFloat]

classmethod from_json(filename: str, extra_filename: str | None = None, encoding: str = 'utf-8')[source]

Load data from a JSON file.

Parameters:

filename (str) – Name of the JSON file from which the object should be loaded. Must have a .json ending.

class mexca.data.SegmentData(**data: Any)[source]

Class for storing speech segment data.

name[source]

Speaker label.

Type:

str

conf[source]

Confidence of speaker label.

Type:

ProbFloat, optional, default=None

class mexca.data.SpeakerAnnotation(**data: Any)[source]

Class for storing speaker and speech segment annotations.

filename[source]

Name of the annotated audio file. Must be a valid path.

Type:

pydantic.FilePath

channel[source]

Channel index.

Type:

int, optional

speaker_average_embeddings[source]

Average embedding vector for each speaker label.

Type:

Dict[FloatToStr, List[float]], optional

segments[source]

Stores speech segments as intervaltree.Interval. Speaker labels are stored in SegmentData objects in the data attribute of each interval.

Type:

intervaltree.IntervalTree, optional

__str__(end: str = '\t', file: TextIO = sys.stdout, header: bool = True)[source]

Return str(self).

classmethod from_pyannote(annotation: pyannote.core.Annotation, embeddings: Dict[str, List[float]] | None = None)[source]

Create a SpeakerAnnotation object from a pyannote.core.Annotation object.

Parameters:

annotation (pyannote.core.Annotation) – Annotation object containing speech segments and speaker labels.

classmethod from_rttm(filename: str, extra_filename: str | None = None)[source]

Load a speaker annotation from an RTTM file.

Parameters:

filename (str) – Path to the file. Must have an RTTM ending.

write_rttm(filename: str)[source]

Write a speaker annotation to an RTTM file.

Parameters:

filename (str) – Path to the file. Must have an RTTM ending.

class mexca.data.TranscriptionData(**data: Any)[source]

Class for storing transcription data.

index[source]

Index of the transcribed sentence.

Type:

int

text[source]

Transcribed text.

Type:

str

speaker[source]

Speaker of the transcribed text.

Type:

str, optional, default=None

confidence[source]

Average word probability of transcribed text.

Type:

ProbFloat, optional, default=None

class mexca.data.AudioTranscription(**data: Any)[source]

Class for storing audio transcriptions.

filename[source]

Name of the transcribed audio file. Must be a valid path.

Type:

pydantic.FilePath

segments[source]

Interval tree containing the transcribed speech segments split into sentences as intervals. The transcribed sentences are stored in the data attribute of each interval.

Type:

intervaltree.IntervalTree, optional, default=None

property subtitles[source]

Deprecated alias for segments.

classmethod from_srt(filename: str, extra_filename: str | None = None)[source]

Load an audio transcription from an SRT file.

Parameters:

filename (str) – Name of the file to be loaded. Must have an .srt ending.

write_srt(filename: str)[source]

Write an audio transcription to an SRT file

Parameters:

filename (str) – Name of the file to write to. Must have an .srt ending.

class mexca.data.SentimentData(**data: Any)[source]

Class for storing sentiment data.

text[source]

Text of the sentence for which sentiment scores were predicted.

Type:

str

pos[source]

Positive sentiment score.

Type:

ProbFloat

neg[source]

Negative sentiment score.

Type:

ProbFloat

neu[source]

Neutral sentiment score.

Type:

ProbFloat

class mexca.data.SentimentAnnotation(**data: Any)[source]

Class for storing sentiment scores of transcribed sentences.

Stores sentiment scores as intervals in an interval tree. The scores are stored in the data attribute of each interval.

filename[source]

Name of the file from which sentiment was extracted. Must be a valid path.

Type:

pydantic.FilePath

class mexca.data.Multimodal(**data: Any)[source]

Class for storing multimodal features.

See the Output section for details.

filename[source]

Name of the video file. Must be a valid path.

Type:

pydantic.FilePath

duration[source]

Video duration in seconds.

Type:

pydantic.NonNegativeFloat, optional, default=None

fps[source]

Frames per second.

Type:

pydantic.PositiveFloat

fps_adjusted[source]

Frames per seconds adjusted for skipped frames. Mostly needed for internal computations.

Type:

pydantic.PositiveFloat

video_annotation[source]

Object containing facial features.

Type:

VideoAnnotation

audio_annotation[source]

Object containing speech segments and speakers.

Type:

SpeakerAnnotation

voice_features[source]

Object containing voice features.

Type:

VoiceFeatures

transcription[source]

Object containing transcribed speech segments split into sentences.

Type:

AudioTranscription

sentiment[source]

Object containing sentiment scores for transcribed sentences.

Type:

SentimentAnnotation

features[source]

Merged features stored in a polars.LazyFrame object that uses lazy evaluation. To trigger evaluation the collect() method can be called.

Type:

polars.LazyFrame

merge_features() polars.LazyFrame[source]

Merge multimodal features from pipeline components into a common data frame.

Transforms and merges the available output stored in the Multimodal object based on the ‘frame’ variable. Stores the merged features as a pandas.DataFrame in the features attribute.

Returns:

Merged multimodal features.

Return type:

pandas.DataFrame