`mexca.audio.extraction`

Extract voice features from an audio file.

Construct a dictionary with keys as feature names and values as feature objects. The dictionary can be used to extract the specified features with the VoiceExtractor. Feature objects require lower-level voice signal properties, which are defined in the requires() method of feach feature class. The VoiceExtractor class computes the properties and supplies them to the feature objects.

Module Contents

Classes

`BaseFeature`	Base class for features.
`FeaturePitchF0`	Extract voice pitch as the fundamental frequency F0 in Hz.
`FeatureJitter`	Extract local jitter relative to the fundamental frequency.
`FeatureShimmer`	Extract local shimmer relative to the fundamental frequency.
`FeatureHnr`	Extract the harmonicity-to-noise ratio in dB.
`FeatureFormantFreq`	Extract formant central frequency in Hz.
`FeatureFormantBandwidth`	Extract formant frequency bandwidth in Hz.
`FeatureFormantAmplitude`	Extract formant amplitude relative to F0 harmonic amplitude.
`FeatureAlphaRatio`	Extract the alpha ratio in dB.
`FeatureHammarIndex`	Extract the Hammarberg index in dB.
`FeatureSpectralSlope`	Extract spectral slopes for frequency bands.
`FeatureHarmonicDifference`	Extract the difference between pitch harmonic and/or formant amplitudes in dB.
`FeatureMfcc`	Extract Mel frequency cepstral coefficients (MFCCs).
`FeatureSpectralFlux`	Extract spectral flux.
`FeatureRmsEnergy`	Extract the root mean squared energy in dB.
`VoiceExtractor`	Extract voice features from an audio file.

Functions

cli()

Command line interface for extracting voice features.

class mexca.audio.extraction.BaseFeature[source]

Base class for features.

Can be used to create custom voice feature extraction classes.

requires() → Optional[Dict[str, type]][source]

Specify objects required for feature extraction.

This method can be overwritten to return a dictionary with keys as the names of objects required for computing features and values the types of these objects. The VoiceExtractor object will look for objects with the specified types and add them as attributes to the feature class with the names of the dictionary keys.

Returns:: Dictionary where keys are the names and values the types of required objects.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeaturePitchF0[source]

Bases: BaseFeature

Extract voice pitch as the fundamental frequency F0 in Hz.

requires() → Optional[Dict[str, mexca.audio.features.PitchFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key pitch_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureJitter[source]

Bases: BaseFeature

Extract local jitter relative to the fundamental frequency.

requires() → Optional[Dict[str, mexca.audio.features.JitterFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key jitter_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureShimmer[source]

Bases: BaseFeature

Extract local shimmer relative to the fundamental frequency.

requires() → Optional[Dict[str, mexca.audio.features.ShimmerFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key shimmer_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureHnr[source]

Bases: BaseFeature

Extract the harmonicity-to-noise ratio in dB.

requires() → Optional[Dict[str, mexca.audio.features.HnrFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key hnr_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureFormantFreq(n_formant: int)[source]

Bases: BaseFeature

Extract formant central frequency in Hz.

Parameters:: n_formant (int) – Index of the formant (starting at 0).

requires() → Optional[Dict[str, mexca.audio.features.FormantFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key formant_frames.
Return type:: dict

apply(time: numpy.ndarray) → Optional[numpy.ndarray][source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureFormantBandwidth(n_formant: int)[source]

Bases: FeatureFormantFreq

Extract formant frequency bandwidth in Hz.

Parameters:: n_formant (int) – Index of the formant (starting at 0).

apply(time: numpy.ndarray) → Optional[numpy.ndarray][source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureFormantAmplitude(n_formant: int)[source]

Bases: BaseFeature

Extract formant amplitude relative to F0 harmonic amplitude.

Parameters:: n_formant (int) – Index of the formant (starting at 0).

requires() → Optional[Dict[str, mexca.audio.features.FormantAmplitudeFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key formant_amp_frames.
Return type:: dict

apply(time: numpy.ndarray) → Optional[numpy.ndarray][source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureAlphaRatio[source]

Bases: BaseFeature

Extract the alpha ratio in dB.

requires() → Optional[Dict[str, mexca.audio.features.AlphaRatioFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key alpha_ratio_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureHammarIndex[source]

Bases: BaseFeature

Extract the Hammarberg index in dB.

requires() → Optional[Dict[str, mexca.audio.features.HammarIndexFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key hammar_index_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureSpectralSlope(lower: float, upper: float)[source]

Bases: BaseFeature

Extract spectral slopes for frequency bands.

Parameters:

lower (float) – Lower and upper boundary of the frequency band for which to extract the spectral slope. A band with these boundaries must exist in the required spectral_slope_frames object.
upper (float) – Lower and upper boundary of the frequency band for which to extract the spectral slope. A band with these boundaries must exist in the required spectral_slope_frames object.

requires() → Optional[Dict[str, type]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key spectral_slope_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureHarmonicDifference(x_idx: int = 0, x_type: str = 'h', y_idx: int = 1, y_type: str = 'h')[source]

Bases: BaseFeature

Extract the difference between pitch harmonic and/or formant amplitudes in dB.

Parameters:

x_idx (int, default=0) – Index of the first/second amplitude.
y_idx (int, default=0) – Index of the first/second amplitude.
x_type (str, default='h') – Type of the first/second amplitude. Must be either ‘h’ for pitch harmonic or ‘f’ for formant.
y_type (str, default='h') – Type of the first/second amplitude. Must be either ‘h’ for pitch harmonic or ‘f’ for formant.

Raises:

ValueError – If x_type or y_type is not ‘h’ or ‘f’.

requires() → Optional[Dict[str, Union[mexca.audio.features.FormantAmplitudeFrames, mexca.audio.features.PitchHarmonicsFrames]]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with keys formant_amp_frames and pitch_harmonics_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureMfcc(n_mfcc: int = 0)[source]

Bases: BaseFeature

Extract Mel frequency cepstral coefficients (MFCCs).

Parameters:: n_mfcc (int, default=0) – Index of the MFCC to be extracted.

requires() → Optional[Dict[str, mexca.audio.features.MfccFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key mfcc_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureSpectralFlux[source]

Bases: BaseFeature

Extract spectral flux.

requires() → Optional[Dict[str, mexca.audio.features.SpectralFluxFrames]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key spectral_flux_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.FeatureRmsEnergy[source]

Bases: BaseFeature

Extract the root mean squared energy in dB.

requires() → Optional[Dict[str, type]][source]

Specify objects required for feature extraction.

Returns:: Dictionary with key rms_frames.
Return type:: dict

apply(time: numpy.ndarray) → numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:: time (numpy.ndarray) – Time points.
Returns:: Feature values interpolated at time points.
Return type:: numpy.ndarray

class mexca.audio.extraction.VoiceExtractor(features: Optional[Dict[str, BaseFeature]] = None, config: Optional[mexca.data.VoiceFeaturesConfig] = None)[source]

Extract voice features from an audio file.

For default features, see the Output section.

Parameters:

features (dict, optional, default=None) – Dictionary with keys as feature names and values as feature extraction objects. If None, default features are extracted.
config (VoiceFeaturesConfig, optional, default=None) – Voice feature extraction configuration object. If None, uses VoiceFeaturesConfig’s default configuration.

apply(filepath: str, time_step: float, skip_frames: int = 1) → mexca.data.VoiceFeatures[source]

Extract voice features from an audio file.

Parameters:

filepath (str) – Path to the audio file.
time_step (float) – The interval between time points at which features are extracted.
skip_frames (int) – Only process every nth frame, starting at 0.

Returns:

A data class object containing the extracted voice features.

Return type:

VoiceFeatures

mexca.audio.extraction.cli()[source]: Command line interface for extracting voice features. See extract-voice -h for details.

mexca.audio.extraction

Module Contents

Classes

Functions

`mexca.audio.extraction`