mexca.data
Objects for storing multimodal data.
Module Contents
Classes
Base class for storing segment data. |
|
Base class for storing features. |
|
Base class for storing annotations. |
|
Video annotation class for storing facial features. |
|
Configure the calculation of signal properties used for voice feature extraction. |
|
Class for storing voice features. |
|
Class for storing speech segment data. |
|
Class for storing speaker and speech segment annotations. |
|
Class for storing transcription data. |
|
Class for storing audio transcriptions. |
|
Class for storing sentiment data. |
|
Class for storing sentiment scores of transcribed sentences. |
|
Class for storing multimodal features. |
Attributes
Value that is returned if a feature is not present. |
|
Probability float type. |
|
Convert floats or integers to strings. |
|
Convert nan float types to None types. |
- mexca.data.FloatToStr[source]
Convert floats or integers to strings.
Type that converts a float or integer to a string. Returns None for other types than
float
,int
,str
.
- mexca.data.FloatOrNone[source]
Convert nan float types to None types.
Type that converts a float that is nan into a None type. Returns also None for None types.
- class mexca.data.BaseFeatures(**data: Any)[source]
Base class for storing features.
- __eq__(other: BaseFeatures) bool [source]
Return self==value.
- class mexca.data.BaseAnnotation(**data: Any)[source]
Base class for storing annotations.
- segments[source]
Interval tree containing
intervaltree.Interval
annotation segments. Annotation data is stored in thedata
attribute of eachintervaltree.Interval
.- Type:
intervaltree.IntervalTree, optional, default=None
- class mexca.data.VideoAnnotation(**data: Any)[source]
Video annotation class for storing facial features.
- frame[source]
Index of each frame. Must be non-negative and in ascending order.
- Type:
List[pydantic.NonNegativeInt], default=list()
- time[source]
Timestamp of each frame in seconds. Must be non-negative and in ascending order.
- Type:
List[pydantic.NonNegativeFloat], default=list()
- face_prob[source]
Probability of a detected face. Is None if no face was detected.
- Type:
List[ProbFloat], optional, default=list()
- face_aus[source]
Facial action unit activations of a detected face. Is None if no face was detected.
- face_label[source]
Label of a detected face. Is None if no face was detected.
- Type:
List[Float2Str], optional, default=list()
- face_embeddings[source]
Embedding vector (list of 512 float elements) for each detected face in the input video.
- class mexca.data.VoiceFeaturesConfig(**data: Any)[source]
Configure the calculation of signal properties used for voice feature extraction.
Create a pseudo-immutable object with attributes that are recognized by the
VoiceExtractor
class and forwarded as arguments to signal property objects defined inmexca.audio.features
. Details can be found in the feature class documentation.- hop_len[source]
Number of samples between frame starting points.
- Type:
pydantic.PositiveInt, default=256
- center[source]
Whether the signal has been centered and padded before framing.
- Type:
bool, default=True
- pad_mode[source]
How the signal has been padded before framing. See
numpy.pad()
. Uses the default value 0 for ‘constant’ padding.- Type:
str, default=’constant’
- spec_window[source]
The window that is applied before the STFT to obtain spectra.
- Type:
_Window, default=”hann”
- pitch_lower_freq[source]
Lower limit used for pitch estimation (in Hz).
- Type:
pydantic.NonNegativeFloat, default=75.0
- pitch_upper_freq[source]
Upper limit used for pitch estimation (in Hz).
- Type:
pydantic.NonNegativeFloat, default=600.0
- ptich_n_harmonics
Number of estimated pitch harmonics.
- Type:
pydantic.PositiveInt, default=100
- pitch_pulse_lower_period[source]
Lower limit for periods between glottal pulses for jitter and shimmer extraction.
- Type:
pydantic.PositiveFloat, default=0.0001
- pitch_pulse_upper_period[source]
Upper limit for periods between glottal pulses for jitter and shimmer extraction.
- Type:
pydantic.PositiveFloat, default=0.02
- pitch_pulse_max_period_ratio[source]
Maximum ratio between consecutive glottal periods for jitter and shimmer extraction.
- Type:
pydantic.PositiveFloat, default=1.3
- pitch_pulse_max_amp_factor[source]
Maximum ratio between consecutive amplitudes used for shimmer extraction.
- Type:
pydantic.PositiveFloat, default=1.6
- hnr_lower_freq[source]
Lower fundamental frequency limit for choosing pitch candidates when computing the harmonics-to-noise ratio (HNR).
- Type:
pydantic.PositiveFloat, default = 75.0
- hnr_rel_silence_threshold[source]
Relative threshold for treating signal frames as silent when computing the HNR.
- Type:
pydantic.PositiveFloat, default = 0.1
- formants_max[source]
The maximum number of formants that are extracted.
- Type:
pydantic.PositiveInt, default=5
- formants_lower_freq[source]
Lower limit for formant frequencies (in Hz).
- Type:
pydantic.NonNegativeFloat, default=50.0
- formants_upper_freq[source]
Upper limit for formant frequencies (in Hz).
- Type:
pydantic.NonNegativeFloat, default=5450.0
- formants_signal_preemphasis_from[source]
Starting value for the applied preemphasis function (in Hz).
- Type:
pydantic.NonNegativeFloat, optional, default=50.0
- formants_window[source]
Window function that is applied before formant estimation.
- Type:
_Window, default=”praat_gaussian”
- formants_amp_lower[source]
Lower boundary for formant peak amplitude search interval.
- Type:
pydantic.PositiveFloat, optional, default=0.8
- formants_amp_upper[source]
Upper boundary for formant peak amplitude search interval.
- Type:
pydantic.PositiveFloat, optional, default=1.2
- formants_amp_rel_f0[source]
Whether the formant amplitude is divided by the fundamental frequency amplitude.
- Type:
bool, optional, default=True
- alpha_ratio_lower_band[source]
Boundaries of the alpha ratio lower frequency band (start, end) in Hz.
- Type:
tuple, default=(50.0, 1000.0)
- alpha_ratio_upper_band[source]
Boundaries of the alpha ratio upper frequency band (start, end) in Hz.
- Type:
tuple, default=(1000.0, 5000.0)
- hammar_index_pivot_point_freq[source]
Point separating the Hammarberg index lower and upper frequency regions in Hz.
- Type:
pydantic.PositiveFloat, default=2000.0
- hammar_index_upper_freq[source]
Upper limit for the Hammarberg index upper frequency region in Hz.
- Type:
pydantic.PositiveFloat, default=5000.0
- spectral_slopes_bands[source]
Frequency bands in Hz for which spectral slopes are estimated.
- Type:
tuple, default=((0.0, 500.0), (500.0, 1500.0))
- mel_spec_lower_freq[source]
Lower frequency boundary for Mel spectogram transformation in Hz.
- Type:
pydantic.NonNegativeFloat, default=20.0
- mel_spec_upper_freq[source]
Upper frequency boundary for Mel spectogram transformation in Hz.
- Type:
pydantic.NonNegativeFloat, default=8000.0
- mfcc_n[source]
Number of Mel frequency cepstral coefficients (MFCCs) that are estimated per frame.
- Type:
pydantic.PositiveInt, default=4
- mfcc_lifter[source]
Cepstral liftering coefficient for MFCC estimation. Must be >= 0. If zero, no liftering is applied.
- Type:
pydantic.NonNegativeFloat, default=22.0
- class mexca.data.VoiceFeatures(**data: Any)[source]
Class for storing voice features.
Features are stored as lists (like columns of a data frame). Optional features are initialized as empty lists.
- frame[source]
The frame index for which features were extracted. Must be non-negative and in ascending order.
- Type:
List[pydantic.NonNegativeInt]
- class mexca.data.SpeakerAnnotation(**data: Any)[source]
Class for storing speaker and speech segment annotations.
- segments[source]
Stores speech segments as
intervaltree.Interval
. Speaker labels are stored inSegmentData
objects in thedata
attribute of each interval.- Type:
intervaltree.IntervalTree, optional
- classmethod from_pyannote(annotation: pyannote.core.Annotation, embeddings: Dict[str, List[float]] | None = None)[source]
Create a
SpeakerAnnotation
object from apyannote.core.Annotation
object.- Parameters:
annotation (pyannote.core.Annotation) – Annotation object containing speech segments and speaker labels.
- class mexca.data.AudioTranscription(**data: Any)[source]
Class for storing audio transcriptions.
- segments[source]
Interval tree containing the transcribed speech segments split into sentences as intervals. The transcribed sentences are stored in the data attribute of each interval.
- Type:
intervaltree.IntervalTree, optional, default=None
- class mexca.data.SentimentAnnotation(**data: Any)[source]
Class for storing sentiment scores of transcribed sentences.
Stores sentiment scores as intervals in an interval tree. The scores are stored in the data attribute of each interval.
- class mexca.data.Multimodal(**data: Any)[source]
Class for storing multimodal features.
See the Output section for details.
- duration[source]
Video duration in seconds.
- Type:
pydantic.NonNegativeFloat, optional, default=None
- fps_adjusted[source]
Frames per seconds adjusted for skipped frames. Mostly needed for internal computations.
- Type:
pydantic.PositiveFloat
- features[source]
Merged features stored in a
polars.LazyFrame
object that uses lazy evaluation. To trigger evaluation thecollect()
method can be called.- Type:
polars.LazyFrame
- merge_features() polars.LazyFrame [source]
Merge multimodal features from pipeline components into a common data frame.
Transforms and merges the available output stored in the Multimodal object based on the ‘frame’ variable. Stores the merged features as a pandas.DataFrame in the features attribute.
- Returns:
Merged multimodal features.
- Return type: