Title

Good afternoon Ladies & Gentlemen. Before Starting the presentation I would like to ask you a question.

Can you please raise your hand if you ever used pathao to move across the city? Thank you for responding. Like most of you, I use Pathao a lot for faster transportation.

I am Mahmudul Islam. Today I am gonna present our paper and the the topic is – Comparison of The Efficiency of Machine Learning Algorithms on Twitter Sentiment Analysis of Pathao

As you can see from the topic, our research is circulated around Pathao. We used Machine Learning algorithms to analyze pathao  user’s review.

Outline

Here is the outline of the presentation-

Introduction: Describe the problem statement briefly and how our proposed solution can benefit businesses.

Research Objectives: Discuss the goal of the research and why we were interested to research on this particular topic.

Problem Statement: Address the actual issue on which the research was conducted.

Research Methodology: Briefly discuss the research approach & process.

Result & Discussion: State the findings of our observations using graphs & tables.

Conclusion: Discuss key outcomes of this research.

Future Work: Talk about the future scopes of this research.

Introduction

  1. In this era of Information & Technology, we are virtually living in social media such as Facebook, Twitter and so on.
  2. Everyday People are acquiring different services and sharing their opinions on it – whether it was good or bad.
  3. As a result social media sites are getting flooded with opinions/information.
  4. Businesses like Pathao could analyze these information/opinions posted by their customers and benefit a lot.
  5. If manual approach is taken to analyze this huge load of information, the cost & time will be unreasonable and accuracy can’t be justified.
  6. Here comes the technology specifically Machine Learning which can be an efficient way to solve this problem.

Research Objective

We had both emotional and practical motivations for chosing this topic.

You know that several months ago Pathao sacked  300 top-level employees.  Everyone has a family, so a huge number of families were affected from this job lay-off. So we wanted to see what’s went wrong with pathao and selected pathao as our research topic.

As far as we know, In Bangladesh a little to almost no company acquired Machine Learning Technology to improve the service quality of it’s business. So it was a good enough reason for us to integrate the power of machine learning with local businesses. We intended to apply different Machine Learning algorithm & find the most efficient one among them in terms of sentiment analysis.

The Buzzwords

Here are some prime terminologies related to our research –

Natural Language Processing – is the technology used to help computers to understand the human’s natural language. The ultimate objective of NLP is to read, decipher, and understand the human languages.

Machine Learning – is a process in which we feed computers data and information and the computers learn and act like humans do and improve their learning process gradually as time passes by.

Sentiment Analysis – is the process of computationally identifying a person’s opinion towards a particular service/product. It can be positive or negative.

Sentiment Expressed in Tweets

Sample Real time tweets from pathao users on twitter.

As you can read, the first statement expresses a positive vibe , hence  it’s a positive sentiment.

The second statement expresses a negative vibe , hence  it’s a negative sentiment.

Approaches for Sentiment Analysis

Read From Slide

Machine Learning Classifiers

Naive Bayes

Naive Bayes provides a way to calculate the posterior probability – the probability of an event occurred in the past.

For example: Hilary Cliton lost the last Presidential Election of USA. What’s the probability that Russian secret service intervened the election process? Using Bayes theorem we can predicts the probability of this event.

SVM

SVM draws a decision boundary between two different sets of classes.

For example: In a sunny day you are walking at a park. There are lots of people with their pets – dogs & cats. Suddenly you saw such a pet and became confused what is it –  a dog or a cat? Now you need help from machine to sort this out. Here comes the SVM algorithm. It will draw a decision boundary between the class of dog & cat based on some features specific to these pets. Thus it will help to figure out whether the pet is cat or dog.

Logistic Regression

Another classification algorithm where the output is in the range between [0 to 1].

For example: You want to check whether a particular email is spam or ham? If the output of the sigmoid function is between 0.5 to 1, it’s spam. Otherwise it’s not a spam.

Problem Statement

People are using Pathao on a daily basis and they are posting reviews of it’s services in social media frequently. However these scattered reviews are not helping Pathao to improve different aspects of it’s business  such as service quality, customer satisfaction rate etc.

There is also a lack of manual labor free automatized  solution by which we can evaluate pathao’s user reviews and find the efficiency of the process.

Existing Approach

Read From Slide

However these existing approach are not as cost-effective and efficient as our proposed approach.

Research Methodology

Read From Slide

Result & Discussion

Read From Slide

Conclusion

Read From Slide

Future Work

Read From Slide

What is Lexicon Dictionary?

A huge corpus of data containing Positive and Negative opinions.

What is classification model?

A classification model tries to draw some conclusion from the given input. It will predict the class labels/categories for the new data. For example, when filtering emails classification model labels them as “spam” or “not spam”.

What is tf-idf?

tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in text mining.

Term frequency (tf) – the number of times that term t occurs in document d.

Inverse document frequency is a measure of how much information the word provides.

What is n-gram?

N-gram is simply a sequence of N words. For instance, let us take a look at the following examples.

  1. San Francisco (is a 2-gram)
  2. The Three Musketeers (is a 3-gram)