Speaker Recognition
Teach VoiceMem to recognize who is speaking.
How It Works
Enroll each known person with 3 or more voice samples of natural speech. When you record a new clip, VoiceMem compares it against enrolled voiceprints and identifies the closest match.
- More samples = better accuracy. Aim for 3-5 samples per person, each a different sentence.
- If similarity is below the threshold (
SPEAKER_THRESHOLD, default 0.75), the speaker is marked Unknown. - Voiceprints are computed from MFCC features — they capture vocal characteristics, not the words spoken.
Add a Person
0 / 3 samples
Sample 1 of 3
Speak a full sentence naturally, as you normally would.
Sample 2 of 3
Say something different — a different sentence or topic.
Sample 3 of 3
One more — speak naturally for 3-5 seconds.
Enrolled Persons