Speaker Recognition

Teach VoiceMem to recognize who is speaking.

How It Works

Enroll each known person with 3 or more voice samples of natural speech. When you record a new clip, VoiceMem compares it against enrolled voiceprints and identifies the closest match.

More samples = better accuracy. Aim for 3-5 samples per person, each a different sentence.
If similarity is below the threshold (SPEAKER_THRESHOLD, default 0.75), the speaker is marked Unknown.
Voiceprints are computed from MFCC features — they capture vocal characteristics, not the words spoken.

Add a Person

Enrolled Persons