I did this in WoW around the first expansion's release (2007?). I leveled a shaman and warrior together and had the shaman nearly entirely controlled via voice with just follow, stop follow, frost shock, totem, heal and a few other things.
The problem is input latency. You think it'll be like...