Find the count of words in ‘Functionality’ column of the attached spreadsheet. Write a python program to find count of words and frequency of words (tf-idf) across the words.
The output file must be in excel format showing:
a. top 100 words and counts of those words by organization name and owner.
b. top 100 words and frequency (tf-idf) of those words by organization name and owner.
c. pie chart or bar graph showing count and frequency of words by organization name.
Another output should be GUI window should have
a. visualization top 20 words count by organization. Display those words and word counts as well.
b and display the complete text of these 20 words in tabular format. Display date and organization name as well.
c interactive chat box within GUI window
d chat box first message that should display ‘How can I help you?’.
e. The user will type the word that they want to search. For example ‘organization’.
f. chat box will search the word that exist in functionality column of source excel file.
g. if the word exist in functionality column, the chat box should display the count and frequency by organization and ask ‘do you want to see complete texts in which this word exist’? and should accept yes or no in response.
h. If the user types ‘yes’, the chat box should display complete text by organization name and date
i. the chat box search of the word should not be case sensitive
j the chat box should also suggest similar words that may exist. For example, if the user searches ‘culture’, the search result should also display suggestion to check the counts of ‘organizational culture’.
k. the chat box should allow to search for two words for example ‘organization culture’
l the chat box should allow user to search for organization name. For example, the user can enter ‘A’ (which is organization name in source file), the chat box should display the top 20 words for this organization by date. and allow to view complete text in which this word exist.
Guidelines for output excel files and GUI:
a. ignore words after ‘comment’ in functionality column of source file.
b. ignore stop words
c. Ignore the words after comment in functionality column.
d. if the user enters word that does not exist, it should display ‘Sorry, I didn’t get that. Let’s try again.’
e. The chat box can be closed by click on ‘X’
f. whenever source file updates, the GUI window and chat box should display most recent information.
g whenever source file updates, the output excel file should display most recent information.
h. The python program must be in anaconda (spyder 2.7 version)
i Provide a separate word document that shows which third-party packages are installed and steps how to install them