It’s clear by now that our natural tendency is to accept machines that can feign humanness, even when there’s no logical reason to do so. That makes it easier for us to transition to a digital world but often confuses the question of what is genuine AI, what is simulacra and what is somewhere in between. In a Wired piece, Vlad Sejnoha uses Spike Jonze’s Her to take a look at the future of computerized assistants. An excerpt:
“One of the most compelling aspects of Samantha is that she behaves in an utterly human-like manner, with a true sense of what is humorous and sad. This is yet a higher level of reasoning, and huge challenges remain to truly understand — and program — social relationships, emotional ties, and humor, which are all parts of everyday knowledge. It is more conceivable that we will be able to make a system understand why a person feels sad or happy (in the most primitive terms, perhaps because of realization of goal failure or goal success), than actually simulating or replicating visceral feelings in machines.
Is it necessary to make intelligent systems human-like?
Much of human behavior is motivated by emotions and not by black-and-white logical arguments (search through any popular online news blog for evidence!). The machine thus needs to understand to some degree why a human is doing something or wants something done, just as much as we demand an explanation from them about their own behavior. There is also a very practical reason to want this: in order to interact effectively we need a model of the ‘other,’ whether it’s an app or a person. At a high level of sophistication it will be faster and more efficient to allow us to start from such models we have of humans, as opposed to slowly discovering the parameters of a wholly alien and new ‘AI tool.’
There is also that astonishing voice… Samantha had us at that first playful and breathy ‘Hi.’
The amazing emotional range and subtle modulation of Samantha’s voice is beyond what today’s speech synthesis can produce, but this technology is on a trajectory to cross the ‘uncanny valley’ (the awkward zone of ‘close but not quite human’ performance) in the next few years. New speech generation models, driven in part by machine learning as well as by explicit knowledge of the meaning of the text, will be able to produce artificial voices with impressively natural characteristics and absence of artifacts.”