My voice recorder app on my Android phone recently upgraded to labeling speakers in real time in the recording (see image.) As I watched it work, it would assume that one person was continuing to speak, then suddenly change its mind and go back and label a sentence as a different speaker.
This process, called "speaker diarization" seems endlessly complex to me (and is so easy for human brains that we take it for granted.) The program would have to detect when a new speaker is speaking, look for patterns in their speech that are unique, then analyze subsequent recorded speech to match that pattern and label the speaker.
Just recognizing when one person stops and another starts seems very complex to me. And it seems from this description that it's constantly recalculating if the current recording is continued speech from one speaker, or another speaker has started.
As I watched my voice recorder transcribe, it was surprisingly good at this - even when two people had overlapping banter and fast back-and-forth conversation. Amazing for handheld technology.
No comments:
Post a Comment