Diarization — the process of partitioning out a speech sample into distinctive, homogeneous segments according to who said what — doesn’t come as easy to machines as it does to humans, and training machine learning algorithm to perform it is tougher than it sounds.