Valentyn M. Yanchuk, Andrii G. Tkachuk, Dmitry S. Antoniuk, Tetiana A. Vakaliuk and Anna A. Humeniuk, Zhytomyr Polytechnic State University, Zhytomyr 10005, Ukraine
A variety of goods and services in the contemporary world requires permanent improvement of services e-commerce platform performance. Modern society is so deeply integrated with mail deliveries, purchasing of goods and services online, that makes competition between service and good providers a key selection factor. As long as logistic and timely and cost-effective delivery plays important part authors decided to analyze possible ways of improvements in the current field, especially for regions distantly located from popular distribution centers. Considering both: fast and lazy delivery the factor of costs is playing an important role for each end-user. Given work proposes a simulation that analyses the current cost of delivery for e-commerce orders in the context of delivery by the Supplier Fleet, World-Wide delivery service fleet and possible vendor drop-ship and checks of the alternative ways can be used to minimize the costs. The main object of investigation is focused around mid and small businesses living far from big distribution centers (except edge cases like lighthouses, edge rocks with very limited accessibility) but actively using e-commerce solutions for daily activities fulfillment. Authors analyzed and proposed a solution for the problem of cost optimization for packages delivery for long-distance deliveries using a combination of paths delivered by supplier fleets, world-wide and local carriers. Data models and Add-ons of contemporary Enterprise Resource Planning systems were used, and additional development is proposed in the perspective of the flow selection change. The experiment is based on data sources of the United States companies using a wide range of carriers for delivery services and uses the data sources of the real companies; however, it applies repetitive simulations to analyze variances in obtained solutions.
Simulation, Customer Behavior, Optimization, E-commerce.
Yifei Yu1, Yu Sun2 and Fangyan Zhang3, 1Sage Hill, Newport Coast, CA, 92657, 2California State Polytechnic University, Pomona, CA, 91768, 3ASML, San Jose, CA, 95131
As people get old, the risk of them falling increases; the fall will impact senior citizens more negatively than younger people. My grandmother once fell and hit her when she was alone at home, and she instantly became unconscious. Frequently, senior citizens are unable to help themselves after they fall, even if they remain conscious. However, there isn’t a product that senior citizens can use to notify their relatives right away if they fall, and this leads to the question of how we can bring immediate aid to all senior citizens after they fall. This paper brings forward the product and software that can solve this problem. The product is a small wristband that detects any falls or collisions and notifies relatives right away. The software is an accompanying app that shows the data recorded from those falls or collisions, specifically designed for family members to keep track of their elders. We applied our application during our test sessions and conducted a qualitative evaluation of the approach. The results show that this experiment is a great solution to our problem, but with a few limitations and weaknesses.
Detection of falling, wristband, iOS, Android.
Saad Al-Ahmadi and Badour AlMulhem, Department of Computer Science, King Saud University, Riyadh, Saudi Arabia
Wireless sensor network (WSN) has proliferated rapidly as a cost-effective solution for data aggregation and measurements under challenging environments. Sensors in WSNs are cheap, powerful, and consume limited energy. The energy consumption is considered to be the dominant concern because it has a direct and significant influence on the application’s lifetime. Recently, the availability of small and inexpensive components such as microphones has promoted the development of wireless acoustic sensor networks (WASNs). Examples of WASN applications are hearing aids, acoustic monitoring, and ambient intelligence. Monitoring animals, especially those that are becoming endangered, can assist with biology researchers’ preservation efforts. In this work, we first focus on exploring the existing methods used to monitor the animal by recognizing their sounds. Then we propose a new energy-efficient approach for identifying animal sounds based on the frequency features extracted from acoustic sensed data. This approach represents a suitable solution that can be implemented and used in various applications. However, the proposed system considers the balance between application efficiency and the sensor’s energy capabilities. The energy savings will be achieved through processing the recognition tasks in each sensor, and the recognition results will be sent to the base station.
Wireless Acoustic Sensor Network, Animal sound recognition, frequency features extraction, energyefficient recognition schema in WASN.
Amin Heydari Alashti1, Ahmad Asgharian Rezaei2, Alireza Elahi3, Sobhan Sayyaran4, Mohammad Ghodsi5, 1BigData Solutions Land, Iran, 2RMIT University, 3Shahid Beheshti University, Tehran, Iran, 4Imam Sadegh University, 5Computer Science Facutly, Sharif University of Technology, Tehran, Iran
Accessing to required data on the internet is wide via search engines in the last two decades owing to the huge amount of available data and the high rate of new data is generating daily. Accordingly, search engines are encouraged to make the most valuable existing data on the web searchable. Knowing how to handle a large amount of data in each step of a search engines' procedure from crawling to indexing and ranking is just one of the challenges that a professional search engine should solve. Moreover, it should also have the best practices in handling users' traffics, state-of-the-art natural language processing tools, and should also address many other challenges on the edge of science and technology. As a result, evaluating these systems is too challenging due to the level of internal complexity they have, and is crucial for finding the improvement path of the existing system. Therefore, an evaluation procedure is a normal subsystem of a search engine that has the role of building its roadmap. Recently, several countries have developed national search engine programs to build an infrastructure to provide special services based on their needs on the available data of their language on the web. This research is conducted accordingly to enlighten the advancement path of two Iranian national search engines: Yooz and Parsijoo in comparison with two international ones, Google and Bing. Unlike related work, it is a semi-automatic method to evaluate the search engines at the first pace. Eventually, we obtained some interesting results which based on them the component-based improvement roadmap of national search engines could be illustrated concretely.
Automatic Search Engine Evaluation, Component-based Search Engine Evaluation, Yooz, Parsijoo, Google, Bing.
Wisal Khan1, Teerath Kumar2 and Bin Luo1, 1Anhui University, Hefei 230039, Peoples Republic of China, 2Kyung Hee University, South Korea
Pseudo examples generation has shown an impressive performance on image classi?cation tasks. Pseudo examples generation is useful when we have data in a few amounts that are used for semi-supervised learning or few-shot learning. Previous work used autoencoder architecture to improve the classi?cation performance in semisupervised learning, and pseudo examples generation and its optimization have improved performance in few-shot learning. In this paper, we propose a new way of pseudo examples generation using only a generator (Decoder) based approach to generate the pseudo examples for each class that is effective for both semi-supervised learning and few-shot learning. In our approach, we ?rst train Decoder for each class using random noise as input and examples as output. Once training is done, we generate a different number of samples using trained Decoders. To check the effectiveness of our approach, we use semi-supervised learning and few-shot learning techniques on famous datasets MNIST and FMNIST for the different numbers of sample selection. Our generator based approach outperforms previous semi-supervised learning and few-shot learning approaches. Secondly, we are the ?rst to release the UrduMNIST dataset consists of 10000 images, including 8000 training and 2000 test images collected through three different methods to include diversity. We also check the effectiveness of our methods on our UrduMNIST dataset.
Autoencoder, Generator, Semi-Supervised learning, few-shot learning.
Noor Fatima, Department of Computer Science, Aligarh Muslim University, Aligarh- 202002, India
Judging the quality of a photograph from the perspective of a photographer we can ascertain resolution, symmetry, content, location, etc. are some of the factors that influence the proficiency of a photograph. The exponential growth in the allurement for photography impels us to discover ways to perfect an input image in terms of the aforesaid parameters. Where content and location are the immutable ones, attributes like symmetry and resolution can be worked upon. In this paper, we will prioritize resolution as our cynosure and there can be multiple ways to refine it. Image super-resolution is progressively becoming a prerequisite in the fraternity of computer graphics, computer vision, and image processing. It’s the process of obtaining High-Resolution Images from their Low-Resolution counterparts. In our work we studied Image Super-Resolution Techniques like Interpolation, SRCNN, SRResNet, GANs for post-enhancement of photographs in Photography as employed by Photo editors and thus proved how Generative Adversarial Networks (GANs) stand to be the most coherent approach for attaining optimized super-resolution in terms of quality by applying a deep adversarial neural network.
Interpolation, GAN, image processing, computer vision, super-resolution.
Ias Sri Wahyuni1 and Rachid Sabre2, 1Universitas Gunadarma, Jl. Margonda Raya No. 100 Depok 16424, Indonesia, 2Laboratory Biogéosciences CNRS, University of Burgundy/Agrosup Dijon, France
The aim of multi-focus image fusion is to integrate images with different objects in focus so that obtained a single image with all objects in focus. In this work, we present a novel multi-focus image fusion method based on neighbour local variability (NLV). This method takes into consideration the information in the surrounding region of pixels. Indeed, at each pixel, the method exploits the local variability calculated from quadratic difference between the value of pixel and the value of all pixels that belong to its neighbourhood. It expresses the behaviour of pixel relative to all pixels belong to its neighbourhood. The variability preserves edge feature because it detects the abrupt image intensity. The fusion of each pixel is performed by weighting each pixel by the exponential of the local variability. The precision of this fusion is depending on the large of the neighbourhood where the large depends on the blurring characterized by the variance and its size of blurring filter. We construct a model that give the value of the large from the variance and the size of blurring filter. We compare our method with other methods; we show that our method gives the best result.
Neighbour Local Variability, Multi-focus image fusion, RMSE.
Qihao Lin, Jinyu Cai and Genggeng Liu, College of Mathematics and Computer Science,Fuzhou university, Fuzhou, 350116, China
High-dimensional of image data is an obstacle for clustering. One of methods to solve it is feature representation learning. However, if the image is distorted or suffers from the influence of noise, the extraction of effective features may be difficult. In this paper, an end-to-end feature learning model is proposed to extract denoising low-dimensional representations from distorted images, and these denoising features are evaluated by comparing with several feature representation methods in clustering task. First, some related works about classical dimensionality reduction are introduced. Then the architecture and working mechanism of denoising feature learning model are presented. As the structural characteristics of this model, it can obtain essential information from image to decrease reconstruction error. When facing with corrupted data, it also runs a better clustering result. Finally, extensive experiments demonstrate that the obtained feature representations by the proposed model are effective on eight standard image datasets.
Unsupervised Learning, Feature Representation, Auto-encoder, Clustering.
Bill Zheng1, Yu Sun2, Fangyan Zhang3, 1Claremont, CA 91711, 2California State Polytechnic University, Pomona, CA, 91768, 3ASML, San Jose, CA, 95131
In the current political climate, mass media was depicted as highly divisive and inaccurate while many cannot efficiently identify its bias presented in the news. Using research regarding keywords in the current political environment, we have designed an algorithm that detects and quantifies political, opinion, and satirical biases present in current day articles. Our algorithm makes use of SciPy’s SK-Learn linear regression model and multiple regression model to automatically identify the bias of a news article based on a scale of 0 to 3 (-3 to 3 in political bias detection) to automatically detect the bias presented in a news source. The usage of this algorithm on all three segments, politics, opinion, and satire has been proven effective, and it enables an average reader to accurately evaluate the bias in a news source.
Mass media, political bias, machine learning, linear regression, regression model.
NRuichu (Eric) Xia1, Yu Sun2, Fangyan Zhang3, 1Santa Margarita Catholic , Rancho Santa Margarita, CA 92688, 2California State Polytechnic University, Pomona, CA, 91768, 3ASML, San Jose, CA, 95131
Many people today suffer from the negative effects of procrastination and poor time management which includes lower productivity, missing opportunities, lower self-esteem and increased levels of guilt, stress, frustration, and anxiety. Although people can often recognize their tendency to procrastinate and the need to change this bad habit, the majority of them still do not take meaningful actions to prevent themselves from procrastinating. To help people fix this problem, we created a goal tracking mobile application called iProgress that aims to assist and motivate people to better manage their time by allowing them to create short-term and long-term goals that they want to achieve, and encouraging them to complete those goals through a rank/reward system that provides them with the opportunity to compete with other users by completing more goals.
Procrastination, iProgreass, flutter, iOS, Android.
Erenus Yildiz and Florentin Wörgötter, III. Physics Institute, Georg-August University of Göttingen, Göttingen, Germany
E-waste recycling is thriving yet there are many challenges waiting to be addressed until high-degree, deviceindependent automation for recycling is possible. One of these challenges is to have automated procedures to detect and locate the wires in a device for cutting and removing them. Here we specifically consider the problem of instance segmentation in order to address this need. We selected several state-of-the-art instance segmentation networks and conducted a comparative evaluation of their performance in order to detect wires found in an electronic disassembly environment. We show that a possible visual scheme to be used in an automated disassembly routine shall use the topscoring network trained with heavy augmentation to detect and outline the wires found, given a very limited dataset. Through experimental evaluation, we report over 90% of mean score in IoU and SSMI metrics. Dataset and code of this study are publicized to facilitate further research.
Wire Detection, Automation, Disassembly, Recycling, Neural-Networks, E-Waste.
Henry Hamilton1, Yu Sun2, Fangyan Zhang3, 1CA 91765, 2California State Polytechnic University, Pomona, CA, 91768, 3ASML, San Jose, CA, 95131
This system provides a method of automatically keeping water bowls full and refilling every time it is detected that they are not. This is highly useful for anyone who owns a pet, as it decreases the amount of work the owner will need to do. The system uses an AI model, trained with over a thousand images of water bowls. This allows it to accurately determine when a bowl needs filling. When an empty bowl is spotted, a subsystem consisting of a valve and other electronic parts releases stored water into the bowl. Through experimentation it has been shown the accuracy of the system is about 97% under optimal lighting conditions. Without a light source, the system does not function. Currently, the components are not of the highest quality and the system only works with the bowl used in testing. There are future plans to train the model with new pictures featuring an assortment of bowls. Additionally, an LED could be added to the system to solve the issue of it not working without external light.
Artificial Intelligence, image detection, RPI system processor.
Justin Kim1, Yu Sun2, Fangyan Zhang3, 1Rancho Cucamonga, CA 91739, 2California State Polytechnic University, Pomona, CA, 91768, 3ASML, San Jose, CA, 95131
Recent years have seen a large increase in the number of programmers, especially as more online resources to learn became available. Many beginner coders struggle with bugs in their code, mostly as a result of lack of knowledge and experience. The common approach is to have plenty of online resources that can address these issues. However, this is inconvenient to the coder, who may be doing this as a hobby and doesn’t have the time or patience to look for a solution. In this project, we address this problem by integrating the coding and error resolving environment. A website has been developed that examines code and provides simpler error messages that give a more comprehensive understanding of the bug. Once an error has been added to the database, the program is able to display the error in a more understandable way. Experiments show that given several sample programs, our tool is able to extract the errors and report a more easily understandable solution.
Programming environment, python, server, database.
Luciana Abednego and Cecilia Esti Nugraheni, Department of Informatics, Parahyangan Catholic University, Indonesia
This paper conducts some experiments with forex trading data. The data being used is from kaggle.com, a website that provides datasets for machine learning and data scientists. The goal of the experiments is to know how to design many parameters in a forex trading robot. Some questions that want to be investigated are: How far the robot must set the stop loss or target profit level from the open position? When is the best time to apply for a forex robot that works only in a trending market? Which one is better: a forex trading robot that waits for a trending market or a robot that works during a sideways market? To answer these questions, some data visualizations are plotted in many types of graphs. The data representations are built using Weka, an open-source machine learning software. The data visualization helps the trader to design the strategy to trade the forex market.
forex trading data, forex data experiments, forex data analysis, forex data visualization.
Cecilia E. Nugraheni, Luciana Abednego, and Maria Widyarini, Dept. of Computer Science, Parahyangan Catholic University, Bandung, Indonesia
The apparel industry is a type of textile industry. One of scheduling problems found in the apparel industry production can be classified as Flow Shop Scheduling Problems (FSSP). GPHH for FSSP is a genetic programming based hyper-heuristic techniques to solve FSSP. The algorithm basically aims to generate new heuristics from two basic (low-level) heuristics, namely Palmer Algorithm and Gupta Algorithm. This paper describes the implementation of the GPHH algorithm and the results of experiments conducted to determine the performance of the proposed algorithm. The experimental results show that the proposed algorithm is promising, has better performance than Palmer Algorithm and Gupta Algorithm.
Hyper-heuristic, Genetic Programming, Palmer Algorithm, Gupta Algorithm, Flow Shop Scheduling Problem, Apparel Industry.
Wesley Fan1, Eric Wasserman1, Eiffel Vuong1, Dylan Lazar1, Matthew Haase1, Yu Sun2, 11001 Cadence, Irvine, CA 92618, 2California State Polytechnic University, Pomona, CA, 91768
In the recent decades, an increasing number of people become overweight, ranging from children to elders. Consequently, a series of diseases come along with obesity. How to control weight effectively is a big concern for most people. In order to improve the awareness of people’s diets and calorie intake, this paper develops an application–Fitable, which can help users by calculating calories burned in a particular workout. The foods that Fitable recommends are all based on the lifestyle the user is aiming to achieve. Until now, the app is accessible to Android users.
Android, flutter, firebase, machine learning.
Djihene Bourenane1, Noria Taghezout2 and Nawal Sad Houari3, 1Computer Science Laboratory Oran (LIO), Department of Computer Science, University Oran 1, Oran, Algeria, 2Computer Science Laboratory Oran (LIO), Department of Computer Science, University Oran 1, Oran, Algeria, 3Computer Science Laboratory Oran (LIO), Département du Vivant et de l’Environnement, University USTO-MB, Oran, Algeria
Collaboration between companies provides a means of communication between individuals belonging to different entities and having a common goal of solving a specific problem. Collaborative work involves the sharing of expertise and resources. However, this sharing of resources must be done reliably and rapidly for effective decision making. In this article, we propose an agent-based architecture for a collaborative decision support system that involves recommendation and negotiation, in order to facilitate the allocation of a service by an enterprise considering fixed multi-criteria. This contribution offers a collective behavioural environment that ensures personalized search and time savings. The multi-agents system is composed of five agents, which are: Manager Agent, Customer Agent, Interpreter Agent, Recommendation Agent and Delivering Agent. The domain ontology is built to structure the semantic relations between terms or concepts in order to extend the semantic scope of requests.
Collaboration, Decision Support, Multi-Agent Systems, Negotiation, Ontology, Recommendation & Services.
Nina Luo1, Caroline Kwan2, Yu Sun3 and Fangyan Zhang4, 1Claremont, CA 91711, 2Westridge, Pasadena, CA 91105, 3California State Polytechnic University, Pomona, CA, 91768, 4ASML, San Jose, CA, 95131
Online reviews now influence many purchasing decisions. However, the length and significance of these reviews vary, especially when reviewers have different criteria for making their assessments. In this paper, we present an efficient method for analyzing restaurant reviews on the popular review site known as Yelp. We have created an application that uses web scraping, natural language processing, and a blacklist to recommend customer favorite dishes from restaurants. To test the app, we have conducted a qualitative evaluation of the approach. Through analyzing two different ways to obtain Yelp reviews and evaluating our word filtering process, we have concluded that an average of 51% of nonfood words are filtered out by the blacklist we made. We provide further details of its deployment, user interface design, and comparison to the opinion mining field, which utilizes similar tools to make financial market predictions based on the perceived public opinion on social media.
Web scraping, natural language processing, flutter, iOS, android.
Prazwal Chhabra, Rizwan Ali And Vikram Pudi, International Institute of Information Technology, Hyderabad, India
Team Recommendation has always been a challenging aspect in team sports. Such systems aim to recommend a player combination best suited against the opposition players, resulting in an optimal outcome. In this paper, we present CRICTRS: a semi-supervised statistical approach to build a team recommendation system for cricket by modelling players into embeddings. To build these embeddings, we design a qualitative and quantitative rating system which considers the strength of opposition also for evaluating player’s performance. The embeddings obtained, describes the strengths and weaknesses of the players based on past performances of the player. We also embark on a critical aspect of team composition, which includes the number of batsmen and bowlers in the team. The team composition changes over time, depending on different factors which are tough to predict, so we take this input from the user and use the player embeddings to decide the best possible team combination with the given team composition.
Data Mining and Data Analytics, Cricket Analytics, Player Representation, Embeddings, Recommendation Systems.
Wilson Zhu1, Yu Sun2, 1Diamond Bar, CA 91765, 2California State Polytechnic University, Pomona, CA, 91768
Standardized testing such as the SAT often requires students to write essays and hires a large number of graders to evaluate these essays which can be time and cost consuming. Using natural language processing tools such as Global Vectors for word representation, and various types of neural networks designed for picture classification, we developed an automatic grading system that is more time- and cost-efficient compared to human graders. We applied our application to a set of manually graded essays provided by a previous competition on Kaggle in 2012 on automated essay grading and conducted a qualitative evaluation of the approach. The result shows that the program is able to correctly score most of the essay and give an evaluation close to that of a human grader on the rest. The system proves itself to be effective in evaluating various essay prompts and capable of real-life application such as assisting another grader or even used as a standalone grader.
Machine learning, auto grading system, neural network model.
Mahbubur Rahman, Department of Computer Science, North American University, Stafford, Texas, USA
Learning from the multidimensional data has been an interesting concept in the field of machine learning. However, such learning becomes difficult; sometimes complex, expensive because of expensive data processing, manipulations as the number of dimension increases. As a result, we have introduced an ordered index-based data organization model as the ordered data set provides easy and efficient access than the unordered data set. The ordering maps the multidimensional dataset in the reduced space and ensures that the information can be retrieved back and forth efficiently. We have found that such multidimensional data storage can enhance both the unsupervised and supervised machine learning computations.
Multidimensional, Euclidean norm, cosine similarity, database, model, hash table, index, K-nearest neighbour, K-means clustering.
David Hu1, Jack Gao2, Yu Sun3, Fangyan Zhang4, 1Irvine, CA 92618, 2Irvine, CA 92618, 3California State Polytechnic University, Pomona, CA, 91768, 4ASML, San Jose, CA, 95131
Natural Disasters, events that are frequently occurring around the world, taking away homes and innocent lives within minutes or even seconds with the only goal of destruction. Only when we see it on the news of fatalities around the globe when we realize how fragile human life is. The most concerning problem is that current disaster training procedures are not sufficient in preparing the general public in the case of natural disasters such as earthquakes which can wipe out thousands of homes and cause massive casualties if not properly prepared for. To address this situation, the first prototype that we came up with was the Safety Lifetime earthquake simulation game. We believe simulation-based learning would be better at covering more information as well as making the lessons more memorable. As a prototype, Safety Lifetime only contains the simulation of a real earthquake, and lessons to guide the user through what to do during an earthquake, what items to collect, what is the safest sheltering location, and more. In order to verify the effectiveness of the training system, we performed a small-scale user study. 10 users are divided into two groups. Group A is given the booklet earthquake educational materials, while Group B is provided with the game system. Each group spends 10 minutes to learn the content, followed by finishing a quiz. The result shows that the average score of the Group A is 8.5/10, while the average score of the Group B is 9.3/10.
Disaster training, earthquake, simulation-based learning, training system.
Chun-peng Chang, Wen-Jen Ho, Yung-chieh Hung, Kuei-Chun Chiang, Bill Zhao, Institute for Information Industry, Taipei, Taiwan
We apply the energy disaggregation for both classification and estimation on 150 AMI (Advanced Metering Infrastructure) smart meters and a small amount of HEMS (Home Energy Management System) smart plugs in a community in New Taipei City, Taiwan. The aim of this paper is to clarify how we lower the cost, obtain the model of appliance usage from only a small portion of households, improve it with simple questionnaire, and generalize it for prediction on collective households. Our investigation demonstrates the benefits and various possibilities for power suppliers and the government, and won the Elite Award in the Presidential Hackathon 2020, Taiwan.
Energy Disaggregation, Non-intrusive Load Monitoring, Deep Learning, Autoencoder.
Breno W. S. R. Carvalho1, Aline Paes2 and Bernardo Gonçalves3, 1IBM Research, Brazil. Institute of Computing, Universidade Federal Fluminense (UFF), Niterói, RJ, Brazil, 2Institute of Computing, Universidade Federal Fluminense (UFF), Niterói, RJ, Brazil, 3IBM Research, Brazil
Semantic Role Labelling (SRL) is the process of automatically finding the semantic roles of terms in a sentence. It is an essential task towards creating a machine-meaningful representation of textual information. One public linguistic resource commonly used for this task is the FrameNet Project. FrameNet is a human and machine-readable lexical database containing a considerable number of annotated sentences, those annotations link sentence fragments to semantic frames. However, while the annotations across all the documents covered in the dataset link to most of the frames, a large group of frames lack annotations in the corpora documents pointing to them. In this paper, we present a data augmentation method for FrameNet documents that increases by over 13% of the total number of annotations. Our approach relies on lexical, syntactic, and semantic aspects of the sentences to provide additional annotations. We evaluate the proposed augmentation method by comparing the performance of a state-of-the-art semantic-role-labelling system, trained using a dataset with and without augmentation.
FrameNet, Frame Semantic Parsing, Semantic Role Labelling, Data Augmentation.
Luyao Peng, Center for the Cognitive Science of Language, Beijing Language and Culture University, Beijing, China
Among approaches in deep neural network, bayesian neural networks (BNNs) is scalable, theoretically grounded and computationally efficient in making predictions/inferences in NLP tasks. However, successful implementations of BNNs require careful tuning of the hyperparameters in the prior distributions for both weights and bias in each layer. We apply an hierarchical prior to the parameters in the conventional BNNs, therefore, the prior variances can be selected deterministically and automatically, the resulting model is called empirical bayesian neural networks (EBNNs). We apply EBNNs to the name entity recognition classification task and compare with several commonly-used deep learning models in terms of classification accuracy, training time, and probabilistic variances in inferences.
Bayesian Neural Networks, Empirical Bayes, Variational Inference, Deep Learning Model, Name Entity Recognition.
Omar Meriwani, Royston, United Kingdom
The studies of humanities in Arabic lacks the quantities and statistical methods of study and analysis, therefore, this field may lose an important aspect of looking at huge amounts of facts comprehensively and superficially in the same time. The study of biographies is one of these important fields, and it is possible to employ text analytics and information retrieval techniques to study it and to give a comprehensive image of different aspects and on different periods. This project has been done as a supportive project to the PhD study about “historiography and social change in the late 19th century Mosul” done by Omar Mohammed in “École des hautes études en sciences sociales, Centre d'études turques, ottomanes, balkaniques et centrasiatiques” in France. The project aims to create a social graph via targeted text from “the Mosul encyclopedia of biography” book. The main tasks of the study are represented by extracting Arabic full names, linking them to their articles in the encyclopedia, find name similarities and then to extract the relationships between them. In addition to secondary tasks such as the classification of persons’ occupations, religious and political activities according to their articles’ texts; and to find and create virtual profiles of secondary persons by their mention in the encyclopedia; finally, to visualize all the extracted information in web based social graph. Out methods achieved high performance in persons extraction 94% (F-Score), persons linkage with articles 96% (F-Score), same person detection 74% (F-Score), and it achieved 74% (F-Score) for relations classification with 61% accuracy in unseen data. Classification models have not been successful.
Digital Humanities, Names Extraction, Biography Analysis, Social Graph, Arabic Text Classification, Arabic Relations Extraction.