key: cord-0899224-k33j10er authors: Alhassan, Nadrh Abdullah; Saad Albarrak, Abdulaziz; Bhatia, Surbhi; Agarwal, Parul title: A Novel Framework for Arabic Dialect Chatbot Using Machine Learning date: 2022-03-10 journal: Comput Intell Neurosci DOI: 10.1155/2022/1844051 sha: 229fe3d3255ef0ac13d8c15110cc63868f8bb9a0 doc_id: 899224 cord_uid: k33j10er With the advent of artificial intelligence and proliferation in the demand for an online dialogue system, the popularity of chatbots is growing on various industrial platforms. Their applications are getting widely noticed with intelligent tools as they are able to mimic human behavior in natural languages. Chatbots have been proven successful for many languages, such as English, Spanish, and French, over the years in varied fields like entertainment, medicine, education, and commerce. However, Arabic chatbots are challenging and are scarce, especially in the maintenance domain. Therefore, this research proposes a novel framework for an Arabic troubleshooting chatbot aiming at diagnosing and solving technical issues. The framework addresses the difficulty of using the Arabic language and the shortage of Arabic chatbot content. This research presents a realistic implementation of creating an Arabic corpus for the chatbot using the developed framework. The corpus is developed by extracting IT problems/solutions from multiple domains and reliable sources. The implementation is carried forward towards solving specific technical solutions from customer support websites taken from different well-known organizations such as Samsung, HP, and Microsoft. The claims are proved by evaluating and conducting experiments on the dataset by comparing with the previous researches done in this field using different metrics. Further, the validations are well presented by the proposed system that outperforms the previously developed different types of chatbots in terms of several parameters such as accuracy, response time, dataset data, and solutions given as per the user input. Every generation of the computer device is becoming more complex than the last one, and the troubleshooting task becomes more difficult for the end-user [1] . Computer problem detection is a complicated process that demands a high level of knowledge and skills. Troubleshooting is the process of locating the cause of a problem in a system and resolving it [2] . On average, computer repair technicians charge globally $60 per hour. Besides, the hourly rate can range from $40 to $90 per hour [3] . Some of these repairs can be addressed by providing assisting through a chatbot to the user. Furthermore, one of the remarkable artificial intelligence applications that have proven its efficiency recently is a chatbot. e chatbot is a simulating program that analyzes and processes human conversation, either written or spoken. It interacts with the end-users via digital devices as if users are communicating with a real person [4] . It provides 24-hour availability, instant answers, endless patience, personalization, and customized dialog to the enduser. Additionally, it extends to benefit various companies in this matter by saving the cost of hiring manpower, achieving customer satisfaction, and reaching new customers [5] . Chatbot relies heavily on Natural Language Processing (NLP), commonly used to make human spoken language understandable for computer machines [6] . e goal of NLP is to take the unstructured input and produce a structured representation of the text that contains understandable language for textual chatbot conversation [7] . Also, Natural Language Processing is known as a model trained which belongs to an Arabic chatbot's textual input and output [8] . e essential part of building chatbots is the conversational interfaces. Since the chatbot responds in terms of text to the user query, NLP yields an adequate textual conversation for the chatbot [9] . In this paper, we rely on NLP to process Arabic language by using the presented framework. e research has identified and minimized the gaps existing in the field of dialects in Arabic language and the advancements in the chatbots by comparing the performance of the proposed system with the previously developed researches. e rest of this paper is organized as follows: next, we present motivation and contributions and a tiny backdrop. We discuss the challenges for developing Arabic chatbots. We then present comparison with existing systems and describe our framework. After that, we illustrate chatbot mechanism used in this paper including experimental result and a preliminary user evaluation. e motivation behind this work has portrayed the fact that Arabic chatbot does not reach the required limit in terms of response speed, dataset type, and how the chatbot handles user input. e outcome of this paper will lie in the implementation based on industrial applicability in building an Arabic Chabot that is capable of assisting end-users to solve troubleshooting problems in Arabic. is framework incurs repair costs by an expert troubleshooting system. It will aid to diagnose IT troubleshooting and solve technical issues within a few seconds. e contributions of this research work are given as follows: (i) To study the background on previously developed chatbots and the challenges of using Arabic chatbot in multiple domains; (ii) to propose the novel IT troubleshooting chatbot framework for Arabic language; (iii) to create the dataset from various sources including multiple companies such as Microsoft, HP, Samsung, Huawei, and different websites that address common IT problems/solutions and apply preprocessing tasks for further uses; (iv) to compare the proposed chatbot with previous researches on varied dialogues, writing the parameters; (v) to evaluate the proposed chatbot by comparing it with other chatbots on different functional evaluation metrics; (vi) to develop a service-based chatbot to provide idiomatic solutions for common IT problems that can be considered for practical industrial applications. e revolution of information technology during our era predetermines that all governmental and private sectors need rapid progress of digital transformation and mounting development. Our lives are not devoid of using technologies, especially those that involve artificial intelligence techniques. Technology has gathered the world societies in a common cyber environment, where every technology that emerges in any place spreads to all applicable societies. Artificial intelligence relies on data as input; then it is processed via specific approach to generate the demand output like-human thinking. Even though, there is a concern whether AI application will rescind the need of human thinking gradually. In fact, AI works as extension of human brain to solve complex problems and process huge amount of data at the same time. One of AI techniques used with different language by machine learning is the acknowledgment of city names. e promising study application areas in the realm of postal automation are the recognition of handwritten city names. For recognition, use a nonsegmentation method (Holistic approach). e role of the convolutional neural network (CNN), which is one of the deep learning techniques, is deconstructed in this paper [8] , detecting hand signs, by anticipating the next word or recommending the most relevant word, and then generating the word that deaf persons communicate with people using sign language. 2.1. Arabic Chatbot. Before time, chatbot was described as a text-interaction between human and machine learning by simulating an online conversation (chat) with a user in natural language. Chatbot was first introduced in 1996. Precisely, ELIZA was the first chatbot created in MIT, which arguably came close to imitating a human reply. ELIZA was given an input sentence and it identifies keywords and patterns to match those keywords against a set of preprogrammed rules to generate appropriate responses [10] . Late in 1971, Kenneth Colby at Stanford created PARRY, a preprogrammed chatbot act-like schizophrenia diagnosed patient and was able to express fears, anxiety, and beliefs. By the time, chatbot's discoveries increased in 1995 by Richard Wallace who created A.L.I.C.E chatbot, using English conversation patterns in AIML files such as (. . .. . ..). AIML is a subordinate of web extensible mark-up language XML. After that, during the millennium, many forms of chatbot were invented. To get a sense, all millennium chatbot inventions refer now to modern chatbots in the AI industry. e Arabic language is considered as one of the Afroasiatic languages. Due to built-in alphabetical structure, it is a notoriously difficult language in categories of programming text processing, because of reading and writing direction, from right to left, which is the opposite direction of English language. To further complicate things, the form of Arabic letter is totally different than English language and vowels are omitted from written Arabic. Globally, there are 422 million Arabic speakers around the world [11] . According to a British council report, Arabic language took the second place for each native speaker who lived in northern Africa such as Algeria and Morocco and western Asia such as Georgia and Azerbaijan and took the first place at Arabian Peninsula such as Saudi Arabia and United Arab Emirates. Moreover, it took the 4th place as a useful language in the trade market in the UK and Arabic countries as well [12] . In this respect, significant work has been done on chatbots. However, the Arabic language is rarely used in chatbot especially in the IT sector. Additionally, few numbers of Arabic chatbots applications/websites are available for the end-user. For instance, in educational domains, Nabiha, one of Arabic chatbots, is concerned about helping college students using informal Arabic via automatic conversation with student inquiries [13] . On the other hand, BOTTA is a fun Arabic conversational chatbot aiming at using an Egyptian accent to have fun chat with users [14] . Also, Quran chatbot is an Arabic one answering users' inquiries religiously, by generating extracted replies from the Holy Quran [15] . Also, Tafsir Al-Ahlam is an Arabic robot specialist interpretation of dreams using a smart search engine to generate output from a local database of a wide range of interpretations taken from the books attributed to Ibn Sirin and Nabulsi son of Shaheen and other accredited authors in the Islamic heritage books, following the origins and rules of interpretation of dreams from the Quran and the Hadith [16] . In medical domain, NALA is an application that provides medical consulting under the supervision of the Ministry of Health [17] . Customers may use service-oriented chatbot to find information on big, complex domains that are difficult to navigate. is is recognized as "service-based chatbot;" the main reasonability for this chatbot is to provide a service, contrary to other chatbots acting as only an entertainer. Many users find it difficult to find the details that they need from website search engine results due to the site's abundance of data. During a conversation with the user, the service-oriented chatbot serves as an automated customer service agent, providing natural language responses and more targeted information. is virtual agent is also programmed to assist with general IT-related questions [18]. A framework is a term commonly used in IT software development which is referred to as a structure consisting of many phases that work together as a foundation for software developers to build programs for a specific platform [19] . As it follows by bot framework, a set of processes and requirements guide the developer while he/ she is programming the bot as well as solving any potential problems met by the developer at the same phase. Bot framework provides tools, services, and skills needed which facilitates the developer's job [20] . Most frameworks share certain fundamentals phases, but they differ according to the purpose of creating the framework itself, based on the most proposed framework that is currently going on in the field of chatbot development. A five functional phases are listed as follows. Automatic speech recognition (ASR) is a technology that provides a voice identifier with a computer interface through the human voice that allows a human being to interact with a computer in such a way that it seems very close to the real human. Natural language understanding (NLU) is highly connected to machine reading comprehension. e process of selecting parts of sentences and analyzing the meaning is tedious because the machine needs to determine the correct syntactic structure and semantics of the language used. Dialogue management is mainly responsible for coordinating the various components of a chatbot. Natural language generation is mainly relating to the process of retrieving and producing the answer. Lastly, there is text to speech. e answer is ready to be retrieved for the end-user [21] . e following is a brief description of some frameworks used for chatbots in Arabic. e first framework is the framework of an intelligent Arabic chatbot for teaching Islamic history. e main goal of this framework is to help and guide designers for the efficient use of chatbots and simulations for teaching Islamic history. It consists of three phases explained as follows. is phase includes the user input such as query, and it is stored immediately in the Heroku cloud server. is framework adopts a cloud computing server to avoid standard dedicated server's problems and costs. is phase includes the use and implementation of many programming languages for handling, storing, processing, searching, and retrieving the data. After that, determining the size of storage might be needed since the content of Islam history is too large to be handled on smartphone's capacity. So, the framework presents the MongoDB technique to be used in this phase. is phase enables the bot to be recordable and learnable, by sorting user input/query at bot memory. e program manager will update the bot's memory by matching words or queries with the appropriate answer. Finally, the last process is using some advanced programs such as neurolinguistic programming (NLU), and it can be implemented manually by human review or automatically by NLU. e current NLU program cannot support Arabic language but it can be replaced with any suitable text processing techniques [22] . e second framework is Facebook Arabic chatbot based on deep learning using RASA framework aimed at answering student inquiries at the University of Islam in Indonesia via Facebook message. College students often need immediate information like asking for something to help desk, especially during this COVID-19 pandemic, due to Facebook popularity in Indonesia. ere were 166.500.000 Facebook users in Indonesia in August 2020, which accounted for 60.8% of its entire population. us, RASA framework has been developed to match with Arabic content [23] . To construct an Arabic chatbot, most of the existing research relies on third-party framework platforms likewise PANDORABOT platform. As a result, some difficulties in dealing with Arabic letters have arisen as well as HTML tags, database scope, response, and input processing. e following framework will overcome these gaps by using programming Python groundwork. In terms of Arabic content, the framework uses AI to improve NLP and machine learning techniques; see Table 1 . As shown in Table 1 , most of the Arabic chatbot applications used premade platform like Pandorabots API. is kind of platform has subscribed package followed by yearly/monthly fees. e developer is in charge of renewing the subscription fee. Eventually, the developer gets exhausted with unnecessary additional costs and limited features, gaining complexity at run time. is conceded on the top list of solutions provided by the proposed framework. Also, code-based application frameworks are rarely supported with Arabic content. is research focuses on solving the problems of complexity and user-friendly programming support with effective applications. is research introduces a novel IT chatbot framework for troubleshooting supporting Arabic language. e framework comprises four phases: GUI, Arabic text processing, AI services, and database. Figure 1 demonstrates the components and the processes involved in the framework design. e chatbot starts by receiving the input as "text format" from the end-user. e bot channels transfer the question/ inquiry regarding any potential IT troubleshooting as a query to the Arabic text processing phase. At this stage, the framework examines the match corpus from the existing knowledge in the database phase and extracts the answer as result to the end-user. Along with that, get the benefit from AI services to enhance the learning process in the database phase for nonexisting knowledge and improve generally text process in Arabic text processing phase. Text processing phase not only provides NLP, but also provides named entity recognition (NER) by labelling named "real-world" objects, like persons, companies, or locations. Entity linking (EL) removes the uncertainty of meaning from an ambiguous sentence, phrase, or other linguistic units. Text processing is useful for more than just decoding and analyzing text back to its origins. Rather, it goes above and above to generate data that aids in the usage of chatbot measures. As it is a real number, it is very easy to do statistical and arithmetical calculations towards NLP, making it suitable for use in statistical ML models and finally using bot channels to retrieve the solution in GUI format through a friendly interface. Moreover, one of the goals intends to contribute to the NLP uses of the Arabic language. e proposed framework endeavor deeply presents how to process Arabic text in the IT domain. Moreover, every chatbot has its dataset or in other terms corpus. e Quran chatbot is one of the Arabic chatbots that adapted the Java program to produce an Arabic AIML dialogue corpus [15] . BOTTA is another Arabic application using AIML files through the Pandora's platform to generate a dialogue corpus [14] . However, there is no Arabic chatbot that has been adopted in Python programming language and generates the corpus via Python environment. So, we built a service-based chatbot to provide idiomatic solutions for common IT problems using python. Besides, build a knowledge base of IT troubleshooting, which can be reused by other researchers. e work has been implemented on a personal laptop with specific features. To deal with the capabilities of the chatbot system, CPU, which is a Dual-Core Intel Core i7 with 8 GB RAM, is used. e processing speed of the CPU is essential in the programming and data mining phases. e mean programming language used is Python and any related library/package is used to run the chatbot mechanism. e incredible amount of data on the Internet is a rich resource for any chatbot application. Web scraping is the process used in this project to transform unstructured data into a specific classification, thus to extract data using web scraping with Python including nltk library to reprocessing Arabic content. And use an open-source project corpus for the Saudi dialect called MADRA PROJ-ECT by data analysts from Jordan. A corpus is a selection of different input statements and responses for the bot to practice with. is project allows Arabic developers to use more than 26 Arabic dialects [24] . But, here in this project, we build our corpus called chattest.text, with the benefit of merging MADRA's corpus and our corpus. Handling. By using online 50 contents as database source, whether structured as csv or unstructured as text/yml, some problems occur. us, we apply text preprocessing with nltk: tokenization, by splitting the Arabic text into smaller pieces or "tokens." Normalization aims to put all Arabic content in one level. Noise removal cleans up the text from extra white spaces. Tables 2 and 3 list the details on the dataset and websites used for extracting the data. In total, 50 texts have been retrieved with nested content, with difficulty in forming data according to the classification prepared previously in the research and some python's library/package, not supporting Arabic language. But by using core web sm for the latest version and scan of the HTML tag of required content, use the HTML attribute to specify the content. e result of data after applying the classification process is shown in Figures 2 and 3. After generating a dataset that is suitable for the IT problem/solution domain, the implementations were done using Python programming language. Python is more suited to make changes to an existing legacy system and offers different methods/libraries such as NumPy, pandas, PyBrain, and SciPy that help expedite AI development. For example, you can leverage proven libraries like scikit-learn for ML and use regularly updated libraries like Apache MXNet, PyTorch, and TensorFlow for DL and bots' projects that support text possessing in Arabic field. us, we create a greeting message for the chatbot and then create (AI keyword matching) for the chatbot's corpus. We use (AI) intelligent search through the corpus using spacy and Python environment. Keyword matching works on using a keyword that appears in the query. And it identically matches any word in a chatbot corpus. In a basic retrieval system, keyword matching cannot be functional without having the full sentence or query to retrieve the relevant answer. But, by using AI keyword matching, the chatbot does not analyze the whole user input but focuses on searching words on phrases defined in the user says [25] . e best way to explain the behaviour of AI keywords is to use a realistic example as shown in Table 4 . In case the chatbot cannot find the entry that matches a keyword, it will return: "I'm sorry! I do not understand you." e proposed framework is evaluated in terms of effectiveness. e comparison of the chatbot developed is compared with other chatbot's platforms using different parameters including time of response, type of dataset, and dealing with user input. e two versions of the chatbot are developed, one by using the proposed framework and the other by using an external framework/platform which is Pandorabot [26] . Both chatbots were fed by the same datasets. To measure the variation between the two bots, different experiments have been performed. Table 5 shows BOTTA's goal is to create a conversational environment and connect with as many Arab users as possible. She's the first chatbot to speak in an Arabic dialect, which helps her achieve her goal of amusing people who are used to conversing in the language BOTTA is using AIML and launched it on the PANDORABOT platform BOTTA's pattern matching will be able to correct 85. Goal-oriented chatbots are a form of currently popular chatbot whose main function is to assist users with a specific set of tasks, such as "how can I unlock my Samsung (mobile) device?" Some publications with goal-oriented chatbots conducted evaluations that were related to the chatbot's objectives. e metrics include the number of successful conversations ended by the system, percentage of the problem solved, average dialogue length, and average of user utterance length [27] . e participants comprised 6 participants from the different educated background like engineering, linguistic, and finance background. Two were females, 4 were males, 3 were aged between 25 and 34, and 2 were aged between 35 and 44. All participants who participated were not having any IT background. " Translation of the dialogue Chatbot: hi, my name is "the smart assistant" tell me your technical problem? And I will try to help you User: I want to remove an update on my device Chatbot: we do not recommend removing any installed updates. However, if it is necessary to remove an update, you can do it in the update history." Table 5 : Comparison of the chatbots (our chatbot and Pandorabot). Comparison criteria Chatbot produces by using our proposal framework Chatbot produces by using Pandorabot platform Type of dataset -Use unstructured Arabic data (text/yaml). And there is no need to define each question/answer because our chatbot can search through all corpus within seconds to retrieve the data. -Use AMIL file required to define each question as and the answer as