From Bonzi Buddy and Bob, to Siri, Alexa, and Cortana. A timeline that shows us how we humanize machines while digitizing humans.
How can malware be functional? Is a part of internet culture not based on warning us about the dangers of malicious code? In the 1990s, the internet was like the wild west (even more so than today), so it wasn’t uncommon for us to install all kinds of programs that we found while browsing those primitive websites. Computers were beginning to emerge as the technology that would simplify our lives and automate all the tasks we performed on the computers of that era.
Towards the end of the ’90s, an virtual assistant called BonziBuddy appeared on the internet. This program was created by the company Bonzi Software, owned by the Bonzi brothers, Joe and Jay, who already had experience in developing applications. They decided to launch their first virtual assistant, based on Microsoft Agent technology. The initial version was a copy of the characters that served as assistants in the Windows operating system, so they had to create their own mascot, which to this day brings back bad memories for those who had the misfortune of installing BonziBuddy on their computers – the classic, infamous, and infamous purple monkey. Once the program was running, this monkey welcomed us, and the first thing we noticed was that this assistant could perform the task of drafting and sending emails “automatically.”
But this assistant wasn’t limited to just helping with emails; it also had the ability to “listen” to voice commands, play games, provide fun facts, tell jokes, sing songs, share stories, and serve as an internet search assistant. However, what caught the most attention was the ability to interact with the mascot. It was believed that BonziBuddy was one of the first functional artificial intelligence because the program had “text-to-speech” capability, allowing it to read text in real-time. It was available for anyone who wanted to try it for free with a direct download from its website, and many people installed the virtual assistant, spending some hours of entertainment exploring what the software had to offer.
However, its “text-to-speech” capability, while revolutionary at the time, didn’t compare to the fact that BonziBuddy learned from the user’s preferences and provided a personalized experience based on this data. It seemed like the beginning and revolution of artificial intelligence would go hand in hand with Bonzi Software and its flagship product, far more advanced than any program of that era.
But the joy of interacting with the purple monkey was short-lived because the application required registration, including the user’s full address, and in the case of accessing the Premium version, it also asked for banking information. Why would an unknown company want detailed user information? The answers to these questions are murky because sensitive information was being provided to a free program from a relatively obscure company.
Shortly after installing the program, random pop-up windows would start appearing on the computer, claiming that a virus had infected it and that it was necessary to purchase InternetALERT, a program developed by Bonzi Software. Once purchased, it magically made the notifications disappear, in addition to being marketed as anti-malware software, pure irony.
However, the most serious issue didn’t stop there: the information gathered by this company was sold for generating personalized ads, and once the assistant was uninstalled from the computer, traces of the software remained, allowing it to monitor the activities of victims without their consent. It was the year 2000.
By this point, we might be wondering: twenty-three years later, major companies make little effort to hide the thriving business involving the personal data we provide on many websites, so why demonize BonziBuddy so much? Were they visionaries in selling personal information to advertising companies? Do we ever read the agreements on the pages where we register?
The only difference is that nowadays, nobody reads the license agreements, and BonziBuddy simply didn’t present us with a legal agreement.
Bonzi had everything to position itself as one of the first functional and entertaining artificial intelligences accessible to the average internet user, but their unethical practices left it as a bad memory of its time. Of course, the development of virtual assistants was not as tumultuous, and after a few years of pause, it is once again becoming a standard in current technology. But how did this advancement occur throughout history?
Evolution of Assistants
The Pioneers: Audrey, the “IBM ShoeBox,” and Harpy
While IBM’s team would mark a milestone in the development and research of artificial intelligence, before the “IBM ShoeBox,” there was a primitive assistant in the 1960s from Bell Laboratories (heirs of telephone inventor Graham Bell), called Audrey. Audrey had limited capabilities in terms of voice recognition; it could only interpret digits from 1 to 9 with an accuracy of 90%, but only when its creator dictated the digits. This accuracy dropped to 70-80% when someone else dictated the numbers. With a size of approximately 1.8 meters, these limitations led to limited publicity for this project, setting the foundation on which IBM’s technology and subsequent developments would be built.
The IBM ShoeBox was an electronic device with the capability of voice recognition. This machine could print on paper the commands that were dictated—a machine that could recognize human speech! It was undoubtedly a technological milestone, able to recognize 16 words and numbers from 0 to 9. However, this device wasn’t limited to just interpreting voice commands; it also had the ability to perform arithmetic operations with the numbers dictated by voice, providing a printed result from the dictation to ShoeBox. While today’s virtual assistants have become more sophisticated, and we’ve grown accustomed to interacting with them, Shoebox represented a new paradigm in the development of computer technology. It’s important to note that this equipment was introduced in the 1960s, specifically in the year 1961.
With the support of Carnegie Mellon University, a voice recognition technology called Harpy was developed. It was based on the designs of earlier systems, which gave it significant recognition capabilities and robustness that its predecessors lacked. In Bruce T. Lowerre’s thesis, “The Harpy Speech Recognition System,” you can find that tools from two earlier systems were implemented, enhancing some of their functions, giving birth to a system with the ability to recognize around 1,000 words. It was based on the Hidden Markov Model – vital today for multiple AIs – allowing it to determine words through statistical probability and predict the next dictated word. We won’t delve into the model’s details since it’s too complex, but I’ve covered what’s necessary for this article.
It wasn’t until the 1990s that virtual assistants were introduced, leading to exciting technologies. Microsoft created 3D-modeled characters that many of us remember with nostalgia.
IBM, on the other hand, improved its project and introduced Tangora, based on language interpretation. It could analyze 20,000 words in three languages (English, Italian, and French). It offered two usage options: complete word dictation or spelling them out, interpretation capability, and the ability to choose letter pronunciation. Leaving the technical details aside, we can conclude from this section that voice recognition technology, the foundation of assistants, continued to evolve and become more sophisticated.
Microsoft Agents and the Failure of BOB
Microsoft Agent arrived on Windows systems in 1990 as part of the BOB project, a program born as a user-friendly alternative to the previous operating system’s native desktop, which was difficult for inexperienced users to navigate. This was the first time virtual assistant characters were implemented, with Rover being the first—a curious yellow dog who guided us through the program’s setup. The interface mimicked the rooms of a house, with interactive on-screen elements, allowing users to move from room to room. However, the application could be overwhelming due to the amount of text displayed, the numerous activities, the option to customize virtual environments, and the ability to change the character serving as the assistant. It felt more like a video game than a traditional interface and had errors that caused blue screens when using some functions, possibly because BOB required more resources than the average computers of the time.
Microsoft BOB failed miserably, and the desktop as we know it today remained the default interface, leaving a clear question: “What does this software have to do with the title of this article?” To which I can respond: Microsoft Agent. It was the technology that was successfully implemented within the Microsoft Office suite, where anyone who used office software in the 1990s will know Clippy, that clip-shaped character that moved across a document and was there to answer our questions and provide guidance while having text-to-speech capabilities. These characters became popular with Office 97. Although these agents were relatively famous and useful in the 1990s, they were eventually discontinued and became incompatible with newer versions of the operating system. However, this didn’t stop Microsoft from continuing to develop text-to-speech and virtual assistant technology.
All this technological progress would reach its peak in the coming years, and we would encounter AI-based assistants within the Internet of Things (IoT). Many of them became more sophisticated, and now we have several options available for devices that implement advanced versions of text-to-speech. However, this wouldn’t be possible without the development of their SAPI (Speech API). Before moving on to the next section, it’s important to understand what an API is: in the world of computing, the term refers to an interface with its own protocols and definitions that enable communication between two applications. An API allows two pieces of software to exchange information, opening up many possibilities for sharing functionality, such as Google’s web translator.
Voice recognition, the Internet of Things, and the definitive implementation of artificial intelligence: Siri, Cortana, Bixby, and Alexa.
APIs based on voice recognition opened up a wide range of possibilities. Considering that Speech APIs not only facilitate the exchange of textual information but also the interpretation and recognition of speech (which, even though translated by the device in the end, makes it more user-friendly), it was now possible to interact with a three-dimensional model of software locally and directly query the internet using spoken language.
When the internet started interacting with objects beyond computers, the concept of the Internet of Things (IoT) was born. It encompasses any physical object with the ability to connect to the internet, including vehicles, watches, speakers, light bulbs, and many other everyday objects. The Internet of Things has become a standard, and it’s hard to imagine life without so many devices relying on a constant internet connection to perform their tasks. Simultaneously, speech interpretation and recognition evolved to the point that a microphone-equipped speaker could make our lives easier.
SAPIs allow these objects to have compatibility and communication with other devices and the internet, making voice recognition possible in devices like mobile phones, as seen with Siri and the assistants that followed.
Siri made its debut in 2007 thanks to SRI Ventures. They developed the speech recognition technology that would become globally known through its integration into Apple’s mobile phones. Apple acquired this technology on April 28, 2010, and implemented it in their iPhone 4S devices in October 2011. With the arrival of this device, virtual assistants gained relevance. It became possible to consult Siri at any time with the famous command “Hey Siri” and see how it adapted to user preferences, which was the most notable feature at the time.
These features weren’t the only things Siri offered. It also allowed interaction between the phone’s own applications, setting alarms by voice command, marking dates on the calendar, dialing contacts from the address book, browsing the internet, among other functions. The ability to use voice commands to perform tasks was undoubtedly the most attractive feature, leading to the emergence of competitors such as Cortana from Microsoft, Bixby from Samsung, and even Google working on its own voice assistant. Initially, virtual assistants were implemented in device operating systems, with Bixby in Samsung’s Galaxy series and Cortana on Windows. While they had similar functionalities, Bixby faced some voice interpretation and functional issues. It wasn’t until Amazon’s device arrived that voice recognition technology became more widespread and firmly entrenched in our lives.
Alexa brought about a revolution; it was implemented in the so-called Echo Dot, smart speakers that allow for voice commands, connection to streaming services like Spotify or Amazon’s own service (Amazon Music), and much more. It’s even capable of interacting with screens, lights, plugs, humidifiers, and a long list of compatible devices. This might sound like a commercial, but it’s not—the compatibility of Alexa within Echo Dot speakers is truly incredible. It’s important to remember that this capability stems from the integration of voice recognition technologies with the IoT.
Up to this point, I cannot stress enough how concerning it could be to automate every detail of our daily lives. These devices are creating a dependency that appears abnormal when we consider the extent to which virtual assistants are reaching into our lives.
Google, Samsung, and Apple have released their own devices, but they haven’t reached the same level of popularity as Amazon. For example, Apple’s HomePod requires that connected devices be from the same company. Samsung, on the other hand, manages to overcome Bixby’s issues and Google with its Nest environment, resulting in a more natural assistant with greater conversational capabilities.
Each option has its pros and cons, like any product on the market. However, the main focus of this article is not to determine which device is better. It was important to highlight the capabilities of each speaker. What is interesting and relevant to the topic is the tremendous advancement in voice recognition technologies. The integration of artificial intelligence within these models raises fascinating questions. It has been announced that Alexa will feature an AI-based language model, enabling conversations similar to those with ChatGPT. As a result, interacting with these devices will become increasingly natural.
This leads me to wonder if we might be moving away from human interaction. At the end of the day, communication with devices is simpler and requires no extra effort in terms of interaction, especially considering that it is as easy as using spoken language. This approach brings us closer to and fosters a sense of attachment to these devices. It may sound exaggerated, but one only needs to look at the parasocial relationships we have formed with internet content creators. It’s also fascinating how parasocial relationships exist in the realm of Vtubers, with animated avatars on the screen that evoke empathy for likable characters. It doesn’t seem so far-fetched that some individuals may replace their social life with interactions with intelligent devices.
The advancement in voice recognition technology has evolved from creating basic assistants to becoming an industry standard. Now, powered by artificial intelligence, it defines a new term for implementing these technologies in the Internet of Things that impact our daily lives: Home Automation, which refers to a set of smart technologies that enable the automation of tasks, such as controlling locks, lighting, thermostats, security alarms, and, well, any device within a home, simply through voice commands. It will be more than interesting to see how these technologies evolve alongside home automation and how they change the way we interact with each other and technology. This can only evoke both fear and hope.
Written by Victor Barba.
Translated by Chatgpt. Corrections by Silvina Canon.
Intimación de los EEUU a Bonzi Software: €BONZI SOFTWARE, INC.€ (ftc.gov)
Investigación sobre Bonzi: https://www.thefastcode.com/article/a-brief-history-of-bonzibuddy-the-internet-s-most-friendly-malware
Historia de los interpretes de voz: https://www.bbc.com/future/article/20170214-the-machines-that-learned-to-listen
IBM ShoeBox: https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html
Sobre el fracaso de Microsoft BOB: https://www.fayerwayer.com/internet/2023/06/13/microsoft-bob-el-mayor-fracaso-de-bill-gates-cinco-curiosidades-sobre-el-fiasco/
Sobre Bixby: https://www.pocket-lint.com/es-es/smartphones/noticias/samsung/140128-que-es-bixby-asistente-de-samsung-explicado-y-como-usarlo/
Historia de Siri: https://www.nextu.com/blog/la-historia-detras-de-siri-rc22/