mexca.audio.extraction
Extract voice features from an audio file.
Construct a dictionary with keys as feature names and values as feature objects. The dictionary can
be used to extract the specified features with the VoiceExtractor. Feature objects require
lower-level voice signal properties, which are defined in the requires() method of feach
feature class. The VoiceExtractor class computes the properties and supplies them to
the feature objects.
Module Contents
Classes
Base class for features. |
|
Extract voice pitch as the fundamental frequency F0 in Hz. |
|
Extract local jitter relative to the fundamental frequency. |
|
Extract local shimmer relative to the fundamental frequency. |
|
Extract the harmonicity-to-noise ratio in dB. |
|
Extract formant central frequency in Hz. |
|
Extract formant frequency bandwidth in Hz. |
|
Extract formant amplitude relative to F0 harmonic amplitude. |
|
Extract the alpha ratio in dB. |
|
Extract the Hammarberg index in dB. |
|
Extract spectral slopes for frequency bands. |
|
Extract the difference between pitch harmonic and/or formant amplitudes in dB. |
|
Extract Mel frequency cepstral coefficients (MFCCs). |
|
Extract spectral flux. |
|
Extract the root mean squared energy in dB. |
|
Extract voice features from an audio file. |
Functions
|
Command line interface for extracting voice features. |
- class mexca.audio.extraction.BaseFeature[source]
Base class for features.
Can be used to create custom voice feature extraction classes.
- requires() Optional[Dict[str, type]][source]
Specify objects required for feature extraction.
This method can be overwritten to return a dictionary with keys as the names of objects required for computing features and values the types of these objects. The
VoiceExtractorobject will look for objects with the specified types and add them as attributes to the feature class with the names of the dictionary keys.- Returns:
Dictionary where keys are the names and values the types of required objects.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeaturePitchF0[source]
Bases:
BaseFeatureExtract voice pitch as the fundamental frequency F0 in Hz.
- requires() Optional[Dict[str, mexca.audio.features.PitchFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key pitch_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureJitter[source]
Bases:
BaseFeatureExtract local jitter relative to the fundamental frequency.
- requires() Optional[Dict[str, mexca.audio.features.JitterFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key jitter_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureShimmer[source]
Bases:
BaseFeatureExtract local shimmer relative to the fundamental frequency.
- requires() Optional[Dict[str, mexca.audio.features.ShimmerFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key shimmer_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureHnr[source]
Bases:
BaseFeatureExtract the harmonicity-to-noise ratio in dB.
- requires() Optional[Dict[str, mexca.audio.features.HnrFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key hnr_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureFormantFreq(n_formant: int)[source]
Bases:
BaseFeatureExtract formant central frequency in Hz.
- Parameters:
n_formant (int) – Index of the formant (starting at 0).
- requires() Optional[Dict[str, mexca.audio.features.FormantFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key formant_frames.
- Return type:
- apply(time: numpy.ndarray) Optional[numpy.ndarray][source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureFormantBandwidth(n_formant: int)[source]
Bases:
FeatureFormantFreqExtract formant frequency bandwidth in Hz.
- Parameters:
n_formant (int) – Index of the formant (starting at 0).
- apply(time: numpy.ndarray) Optional[numpy.ndarray][source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureFormantAmplitude(n_formant: int)[source]
Bases:
BaseFeatureExtract formant amplitude relative to F0 harmonic amplitude.
- Parameters:
n_formant (int) – Index of the formant (starting at 0).
- requires() Optional[Dict[str, mexca.audio.features.FormantAmplitudeFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key formant_amp_frames.
- Return type:
- apply(time: numpy.ndarray) Optional[numpy.ndarray][source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureAlphaRatio[source]
Bases:
BaseFeatureExtract the alpha ratio in dB.
- requires() Optional[Dict[str, mexca.audio.features.AlphaRatioFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key alpha_ratio_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureHammarIndex[source]
Bases:
BaseFeatureExtract the Hammarberg index in dB.
- requires() Optional[Dict[str, mexca.audio.features.HammarIndexFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key hammar_index_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureSpectralSlope(lower: float, upper: float)[source]
Bases:
BaseFeatureExtract spectral slopes for frequency bands.
- Parameters:
lower (float) – Lower and upper boundary of the frequency band for which to extract the spectral slope. A band with these boundaries must exist in the required spectral_slope_frames object.
upper (float) – Lower and upper boundary of the frequency band for which to extract the spectral slope. A band with these boundaries must exist in the required spectral_slope_frames object.
- requires() Optional[Dict[str, type]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key spectral_slope_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureHarmonicDifference(x_idx: int = 0, x_type: str = 'h', y_idx: int = 1, y_type: str = 'h')[source]
Bases:
BaseFeatureExtract the difference between pitch harmonic and/or formant amplitudes in dB.
- Parameters:
x_idx (int, default=0) – Index of the first/second amplitude.
y_idx (int, default=0) – Index of the first/second amplitude.
x_type (str, default='h') – Type of the first/second amplitude. Must be either ‘h’ for pitch harmonic or ‘f’ for formant.
y_type (str, default='h') – Type of the first/second amplitude. Must be either ‘h’ for pitch harmonic or ‘f’ for formant.
- Raises:
ValueError – If x_type or y_type is not ‘h’ or ‘f’.
- requires() Optional[Dict[str, Union[mexca.audio.features.FormantAmplitudeFrames, mexca.audio.features.PitchHarmonicsFrames]]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with keys formant_amp_frames and pitch_harmonics_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureMfcc(n_mfcc: int = 0)[source]
Bases:
BaseFeatureExtract Mel frequency cepstral coefficients (MFCCs).
- Parameters:
n_mfcc (int, default=0) – Index of the MFCC to be extracted.
- requires() Optional[Dict[str, mexca.audio.features.MfccFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key mfcc_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureSpectralFlux[source]
Bases:
BaseFeatureExtract spectral flux.
- requires() Optional[Dict[str, mexca.audio.features.SpectralFluxFrames]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key spectral_flux_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.FeatureRmsEnergy[source]
Bases:
BaseFeatureExtract the root mean squared energy in dB.
- requires() Optional[Dict[str, type]][source]
Specify objects required for feature extraction.
- Returns:
Dictionary with key rms_frames.
- Return type:
- apply(time: numpy.ndarray) numpy.ndarray[source]
Extract features at time points by linear interpolation.
- Parameters:
time (numpy.ndarray) – Time points.
- Returns:
Feature values interpolated at time points.
- Return type:
- class mexca.audio.extraction.VoiceExtractor(features: Optional[Dict[str, BaseFeature]] = None, config: Optional[mexca.data.VoiceFeaturesConfig] = None)[source]
Extract voice features from an audio file.
For default features, see the Output section.
- Parameters:
features (dict, optional, default=None) – Dictionary with keys as feature names and values as feature extraction objects. If None, default features are extracted.
config (VoiceFeaturesConfig, optional, default=None) – Voice feature extraction configuration object. If None, uses
VoiceFeaturesConfig’s default configuration.