Table of Contents

Personal voice assistant using Python

This assistant can open applications, search the web, Youtube, Wikipedia, and search the entire Wolfram Alpha Database for all of your questions. It is a stack of simple elif statements which you can customise completely.

Basic Requirements:

Getting started is pretty simple. There are just three prerequisites and then you are on your way. You’re going to need-

Python 3
Pip 3
A good text editor such as Visual Studio Code

Step 1: Installing the packages

There are a few packages to be installed first such as gTTS, pyaudio, playsound etc. Installing them is simple, just run the following commands in the Command Prompt or Terminal.

$:~ pip3 install gTTS

The above command installs the Google Text To Speech (gTTS) library which will convert whatever we speak into text.

$:~ pip3 install SpeechRecognition

The above command installs the Speech Recognition package which understands our audio and converts it into text.

$:~ pip3 install -U selenium

The above command installs the Selenium Web Driver package that controls and searches the web.

$:~ pip3 install wolfram-alpha-api

The above command installs the Wolfram Alpha Api which calculates everything you ask for.

$:~ pip3 install playsound

The above command installs the Playsound package which plays the saved audio file from your computer.

$:~ sudo apt-get install python3-pyaudio

The above command installs the Pyaudio package which ’listens’ for your voice

Step 2: Getting started with the code

    import speech_recognition as sr #to recognise your audio
    import playsound  # to play saved mp3 file
    from gtts import gTTS  # google text to speech which converts text into speech
    import os  # to save/open files
    import wolframalpha  # to calculate any query the user asks
    import random   #to play a random song which not even the user can predict
    from selenium import webdriver  # to control browser operations
    from selenium.webdriver.chrome.options import Options
    from pygame import mixer #to play songs

      num = 1
      def assistant_speaks(output):
          global num
          # num to rename every audio file
          # with different name to remove ambiguity
          num += 1
          print("Jarvis : ", output)
          toSpeak = gTTS(text=output, lang='en-IN', slow=False)
          # saving the audio file given by google text to speech
          file = str(num)+".mp3"
          toSpeak.save(file)
          # playsound package is used to play the same file.
          playsound.playsound(file, True)
          os.remove(file)
      
      def get_audio():
          rObject = sr.Recognizer()
          audio = ''
          with sr.Microphone() as source:
              print("Speak..")
              # recording the audio using speech recognition
              audio = rObject.listen(source, phrase_time_limit=7)
          print("Stop.")  # limit 5 secs
          try:
              text = rObject.recognize_google(audio, language='en-IN')
              print("You : ", text)
              return text
          except:
              assistant_speaks("Could'nt understand your audio, Please try again! :(")
              return 0
     def search_web(input):
          options = Options()
          options.add_argument('start-maximized')
          options.add_argument('disable-infobars')
          driver = webdriver.Chrome(chrome_options=options)
          driver.implicitly_wait(1)
          if 'youtube' in input.lower():
              assistant_speaks("Opening in youtube!")
              indx = input.lower().split().index('youtube')
              query = input.split()[indx + 1:]
              driver.get("https://www.youtube.com/results?search_query=" + str(query))
          elif 'wikipedia' in input.lower():
              assistant_speaks("Opening Wikipedia")
              indx = input.lower().split().index('wikipedia')
              query = input.split()[indx + 1:]
              driver.get("https://en.wikipedia.org/wiki/" + '_'.join(query))
          elif 'maps' in input.lower():
              assistant_speaks("Opening Google Maps")
              indx = input.lower().split().index('maps')
              query = input.split()[indx + 1:]
              driver.get("https://www.google.com/maps/place/" + '_'.join(query))
          else:
              if 'google' in input:
                  indx = input.lower().split().index('google')
                  query = input.split()[indx + 1:]
                  driver.get("https://www.google.com/search?q =" + '+'.join(query))
              elif 'search' in input:
                  indx = input.lower().split().index('google')
                  query = input.split()[indx + 1:]
                  driver.get("https://www.google.com/" + '+'.join(query))
              else:
                  driver.get("https://www.google.com/search?q=" +
                             '+'.join(input.split()))

The main functions of the program are the get_audio() and assistant_speaks functions. The get_audio() function “listens” to the audio(What you speak) through the microphone, the time limit is set to 7 sec (You can change it). The assistant_speaks function is used to speak out the output after the computer has processed your query. So now you probably understand how the assistant’s going to work. If not, don’t fret. I too didn’t get it for the first time either :)😒

Step 3: Random Fun

We have used the elif function so that the assistant can answer on its own without searching the web or anything. Examples are queries like “Who made you”, “Where do you live” etc.

 process_text(input):
    try:
        if 'search' in input or 'play' in input:
            # a basic web crawler using selenium
            search_web(input)
        elif "who made you" in input or "who created you" in input:
            speak = "I have been created by Chivukula Virinchi."
            assistant_speaks(speak)
        elif "what is your name" in input or "who are you" in input:
            speak = "Did I forget to introduce myself? I am your personal assistant. Assistance is my middle name."
            assistant_speaks(speak) 
        elif "when is your birthday" in input:
            speak = "I go through lots and lots of updates. So that's about 365-birthdays."
            assistant_speaks(speak)        
        elif "where do you live" in input:
            speak = "I’m stuck inside a device!! Help! Just kidding. I like it in here. Sometimes I hang out in the Cloud. It gives me a great view of the World Wide Web."
            assistant_speaks(speak)
        elif "do you sleep" in input or "when do you sleep" in input:
            speak = "I take power naps when we aren't talking."
            assistant_speaks(speak)                
        elif "self-destruct" in input:
            speak = "Commencing Self-Destruct protocol in T-minus 2 seconds Boom! Actually I think I'll stick around"
            assistant_speaks(speak)
        elif "what do you think about me" in input or "what is your opinion about me" in input:
            speak = "I think you're extremely cool :)"
            assistant_speaks(speak) 
        elif "sing a song" in input:
            speak = "Here is a song I composed just for lovely people like you!"
            assistant_speaks(speak)
            r = str(random.randrange(6))
            playsound.playsound("song" + r + ".mp3", True)

PS: I didn’t include the audio files with the names so you need to download them from the Github repo.

Step 4: Doing the Calculations

Now it’s time for some calculations. For this, we are going to use the wolfram-alpha-api which provides answers to litreally every question you ask.

elif "calculate" in input.lower():
          app_id = "#app_id here"
          client = wolframalpha.Client(app_id)
          indx = input.lower().split().index('calculate')
          query = input.split()[indx + 1:]
          res = client.query(' '.join(query))
          answer = next(res.results).text
          assistant_speaks("The answer is " + answer)
      elif 'open' in input:

          # another function to open
          # different application availaible
          open_application(input.lower())
      else:
          search_web(input)
  except:
    assistant_speaks("Could not understand your audio, Please try again!")
  return 0

PS: You’ll need a Wolfram Alpha developer app id which I have not included in this tutorial. Create your own APP ID here!. Whenever you use this function, always use the word calculate so that your query can be redirected to the wolfram-alpha-api

Step 5: The finishing touch

Now all that’s left to do is to give a few finishing touches and then we are done. Add this final block to your main code and then you’re done!

if __name__ == "__main__":
    # assistant_speaks("What's your name?")
    name = 'Virinchi'
    # name = get_audio()
    assistant_speaks("Hello, " + name + '.')

    while(1):

        assistant_speaks("How can I help you " + name + '?')
        text = get_audio()

        if text == 0:
            continue

        if "goodnight" in str(text) or "bye" in str(text):
            assistant_speaks("Ok bye, " + name+'.')
            break

        # calling process text to process the query
        process_text(text)

# function used to open application
# present inside the system.
def open_application(input):

    if "chrome" in input:
        assistant_speaks("Google Chrome")
        os.startfile('/usr/bin/google-chrome-stable')
        return

    elif "firefox" in input or "mozilla" in input:
        assistant_speaks("Opening Mozilla Firefox")
        os.startfile('/usr/bin/firefox')
        return

    elif "word" in input:
        assistant_speaks("Opening Microsoft Word")
        os.startfile(
            'C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Microsoft Office 2013\\Word 2013.lnk')
        return

    elif "excel" in input:
        assistant_speaks("Opening Microsoft Excel")
        os.startfile(
            'C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Microsoft Office 2013\\Excel 2013.lnk')
        return

    else:

        assistant_speaks("Application not available")
        return

Step 6: And then you’re done!

Now you have officially finished building an AI assistant! All that’s left to do now is to take it for a trial run and then show it off! You can run it in the terminal/cmd using python3 filename.py.

Step 8: Some important things to keep in mind

You can customise the above code into your style and call it your own. With some experience of Python and a few tweaks here and there and its all yours. Customise it just the way you like it and make it yours. The below are just some examples to show you the power of AI and you are free to experiment.
Whenever you need the assistant to take input from the wolfram-alpha-api you must include the word Calculate before the command.
I have included the entire code and additional instructions in a Github Repo and you can download it from here. This file includes audio files which will be used to play the ‘song’ the assistant sings.
You are going to get errors a few times when you run it in the beginning. Debug each of them and make sure to check if each package has been installed properly.
I have not included getting the google token. I have also created a Github Repo which include audio files and other important files for the assistant to work, which you can access here

Step 8: Example Commands

Youtube Chivukula Virinchi This query searches Youtube for my channel
Wikipedia ISRO This query searches Wikipedia for ‘ISRO’
Maps New Delhi This query searches Maps for ‘New Delhi’
Sing a Song This query plays a random audio file from the list of songs.
Where do you live To this, the assistant replies “that its stuck inside a device.
Self-Destruct The assistant replies “Self-Destructing in 5 4 3 2 1 Boom!
Calculate square root of 2 To this it says, the answer is 1.41421356237…
Calculate 6 factorial To this it replies, the answer is 720
Cricket For this command, it searches Google for ‘Cricket’
Calculate the formula of Methyl Isocyanate To this it says, the answer is C₂H₃NO