Personal voice assistant using Python
This assistant can open applications, search the web, Youtube, Wikipedia, and search the entire Wolfram Alpha Database for all of your questions. It is a stack of simple elif
statements which you can customise completely.
Basic Requirements:
Getting started is pretty simple. There are just three prerequisites and then you are on your way. You’re going to need-
- Python 3
- Pip 3
- A good text editor such as Visual Studio Code
Step 1: Installing the packages
There are a few packages to be installed first such as gTTS, pyaudio, playsound etc. Installing them is simple, just run the following commands in the Command Prompt or Terminal.
$:~ pip3 install gTTS
The above command installs the Google Text To Speech (gTTS) library which will convert whatever we speak into text.
$:~ pip3 install SpeechRecognition
The above command installs the Speech Recognition package which understands our audio and converts it into text.
$:~ pip3 install -U selenium
The above command installs the Selenium Web Driver package that controls and searches the web.
$:~ pip3 install wolfram-alpha-api
The above command installs the Wolfram Alpha Api which calculates everything you ask for.
$:~ pip3 install playsound
The above command installs the Playsound package which plays the saved audio file from your computer.
$:~ sudo apt-get install python3-pyaudio
The above command installs the Pyaudio package which ’listens’ for your voice
Step 2: Getting started with the code
import speech_recognition as sr #to recognise your audio
import playsound # to play saved mp3 file
from gtts import gTTS # google text to speech which converts text into speech
import os # to save/open files
import wolframalpha # to calculate any query the user asks
import random #to play a random song which not even the user can predict
from selenium import webdriver # to control browser operations
from selenium.webdriver.chrome.options import Options
from pygame import mixer #to play songs
num = 1
def assistant_speaks(output):
global num
# num to rename every audio file
# with different name to remove ambiguity
num += 1
print("Jarvis : ", output)
toSpeak = gTTS(text=output, lang='en-IN', slow=False)
# saving the audio file given by google text to speech
file = str(num)+".mp3"
toSpeak.save(file)
# playsound package is used to play the same file.
playsound.playsound(file, True)
os.remove(file)
def get_audio():
rObject = sr.Recognizer()
audio = ''
with sr.Microphone() as source:
print("Speak..")
# recording the audio using speech recognition
audio = rObject.listen(source, phrase_time_limit=7)
print("Stop.") # limit 5 secs
try:
text = rObject.recognize_google(audio, language='en-IN')
print("You : ", text)
return text
except:
assistant_speaks("Could'nt understand your audio, Please try again! :(")
return 0
def search_web(input):
options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
driver = webdriver.Chrome(chrome_options=options)
driver.implicitly_wait(1)
if 'youtube' in input.lower():
assistant_speaks("Opening in youtube!")
indx = input.lower().split().index('youtube')
query = input.split()[indx + 1:]
driver.get("https://www.youtube.com/results?search_query=" + str(query))
elif 'wikipedia' in input.lower():
assistant_speaks("Opening Wikipedia")
indx = input.lower().split().index('wikipedia')
query = input.split()[indx + 1:]
driver.get("https://en.wikipedia.org/wiki/" + '_'.join(query))
elif 'maps' in input.lower():
assistant_speaks("Opening Google Maps")
indx = input.lower().split().index('maps')
query = input.split()[indx + 1:]
driver.get("https://www.google.com/maps/place/" + '_'.join(query))
else:
if 'google' in input:
indx = input.lower().split().index('google')
query = input.split()[indx + 1:]
driver.get("https://www.google.com/search?q =" + '+'.join(query))
elif 'search' in input:
indx = input.lower().split().index('google')
query = input.split()[indx + 1:]
driver.get("https://www.google.com/" + '+'.join(query))
else:
driver.get("https://www.google.com/search?q=" +
'+'.join(input.split()))
The main functions of the program are the get_audio()
and assistant_speaks
functions. The get_audio() function “listens” to the audio(What you speak) through the microphone, the time limit is set to 7 sec (You can change it). The assistant_speaks
function is used to speak out the output after the computer has processed your query. So now you probably understand how the assistant’s going to work. If not, don’t fret. I too didn’t get it for the first time either :)š
Step 3: Random Fun
We have used the elif
function so that the assistant can answer on its own without searching the web or anything. Examples are queries like “Who made you”, “Where do you live” etc.
process_text(input):
try:
if 'search' in input or 'play' in input:
# a basic web crawler using selenium
search_web(input)
elif "who made you" in input or "who created you" in input:
speak = "I have been created by Chivukula Virinchi."
assistant_speaks(speak)
elif "what is your name" in input or "who are you" in input:
speak = "Did I forget to introduce myself? I am your personal assistant. Assistance is my middle name."
assistant_speaks(speak)
elif "when is your birthday" in input:
speak = "I go through lots and lots of updates. So that's about 365-birthdays."
assistant_speaks(speak)
elif "where do you live" in input:
speak = "Iām stuck inside a device!! Help! Just kidding. I like it in here. Sometimes I hang out in the Cloud. It gives me a great view of the World Wide Web."
assistant_speaks(speak)
elif "do you sleep" in input or "when do you sleep" in input:
speak = "I take power naps when we aren't talking."
assistant_speaks(speak)
elif "self-destruct" in input:
speak = "Commencing Self-Destruct protocol in T-minus 2 seconds Boom! Actually I think I'll stick around"
assistant_speaks(speak)
elif "what do you think about me" in input or "what is your opinion about me" in input:
speak = "I think you're extremely cool :)"
assistant_speaks(speak)
elif "sing a song" in input:
speak = "Here is a song I composed just for lovely people like you!"
assistant_speaks(speak)
r = str(random.randrange(6))
playsound.playsound("song" + r + ".mp3", True)
PS: I didn’t include the audio files with the names so you need to download them from the Github repo.
Step 4: Doing the Calculations
Now it’s time for some calculations. For this, we are going to use the wolfram-alpha-api
which provides answers to litreally every question you ask.
elif "calculate" in input.lower():
app_id = "#app_id here"
client = wolframalpha.Client(app_id)
indx = input.lower().split().index('calculate')
query = input.split()[indx + 1:]
res = client.query(' '.join(query))
answer = next(res.results).text
assistant_speaks("The answer is " + answer)
elif 'open' in input:
# another function to open
# different application availaible
open_application(input.lower())
else:
search_web(input)
except:
assistant_speaks("Could not understand your audio, Please try again!")
return 0
PS: You’ll need a Wolfram Alpha developer app id which I have not included in this tutorial. Create your own APP ID here!. Whenever you use this function, always use the word calculate so that your query can be redirected to the wolfram-alpha-api
Step 5: The finishing touch
Now all that’s left to do is to give a few finishing touches and then we are done. Add this final block to your main code and then you’re done!
if __name__ == "__main__":
# assistant_speaks("What's your name?")
name = 'Virinchi'
# name = get_audio()
assistant_speaks("Hello, " + name + '.')
while(1):
assistant_speaks("How can I help you " + name + '?')
text = get_audio()
if text == 0:
continue
if "goodnight" in str(text) or "bye" in str(text):
assistant_speaks("Ok bye, " + name+'.')
break
# calling process text to process the query
process_text(text)
# function used to open application
# present inside the system.
def open_application(input):
if "chrome" in input:
assistant_speaks("Google Chrome")
os.startfile('/usr/bin/google-chrome-stable')
return
elif "firefox" in input or "mozilla" in input:
assistant_speaks("Opening Mozilla Firefox")
os.startfile('/usr/bin/firefox')
return
elif "word" in input:
assistant_speaks("Opening Microsoft Word")
os.startfile(
'C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Microsoft Office 2013\\Word 2013.lnk')
return
elif "excel" in input:
assistant_speaks("Opening Microsoft Excel")
os.startfile(
'C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Microsoft Office 2013\\Excel 2013.lnk')
return
else:
assistant_speaks("Application not available")
return
Step 6: And then you’re done!
Now you have officially finished building an AI assistant! All that’s left to do now is to take it for a trial run and then show it off! You can run it in the terminal/cmd using python3 filename.py
.
Step 8: Some important things to keep in mind
- You can customise the above code into your style and call it your own. With some experience of Python and a few tweaks here and there and its all yours. Customise it just the way you like it and make it yours. The below are just some examples to show you the power of AI and you are free to experiment.
- Whenever you need the assistant to take input from the
wolfram-alpha-api
you must include the word Calculate before the command. - I have included the entire code and additional instructions in a Github Repo and you can download it from here. This file includes audio files which will be used to play the ‘song’ the assistant sings.
- You are going to get errors a few times when you run it in the beginning. Debug each of them and make sure to check if each package has been installed properly.
- I have not included getting the google token. I have also created a Github Repo which include audio files and other important files for the assistant to work, which you can access here
Step 8: Example Commands
Youtube Chivukula Virinchi
This query searches Youtube for my channelWikipedia ISRO
This query searches Wikipedia for ‘ISRO’Maps New Delhi
This query searches Maps for ‘New Delhi’Sing a Song
This query plays a random audio file from the list of songs.Where do you live
To this, the assistant replies “that its stuck inside a device.Self-Destruct
The assistant replies “Self-Destructing in 5 4 3 2 1 Boom!Calculate square root of 2
To this it says, the answer is 1.41421356237…Calculate 6 factorial
To this it replies, the answer is 720Cricket
For this command, it searches Google for ‘Cricket’Calculate the formula of Methyl Isocyanate
To this it says, the answer is C2H3NO