Behaviors Modeling and Analysis of Big Data from Web Apps Using Machine Learning and Deep RNN Techniques

ACKNOWLEDGEMENT	第5-12页
ABSTRACT	第12-13页
论文文摘/Translated Abstract	第14-16页
CHAPTER 1 INTRODUCTION	第16-22页
1.1 Background	第16页
1.2 Motivation	第16-17页
1.3 Objectives of the study	第17页
1.4 Problem description	第17-19页
1.4.1 Behavior exhibiting features on data	第18-19页
1.5 Implementation and deployment of deep learning	第19-20页
1.6 Study contributions	第20页
1.7 Organization of the Thesis	第20-22页
CHAPTER 2 BACKGROUND LITERATURE REVIEW	第22-26页
2.1 Related works	第22-23页
2.2 Behavior Modeling and Anomaly Detection Literature	第23-24页
2.3 Deep Data Analytics and Behavior Modeling	第24-26页
CHAPTER 3 METHODOLOGY OF DEEP LEARNING USING RNNs	第26-38页
3.1 RNNs and Deep Learning	第26-27页
3.1.1 Characteristics of Deep RNNs	第26-27页
3.2 Methodology and mathematical concepts in RNN and deep learning	第27-34页
3.2.1 Perceptron as a basic neuron	第27-29页
3.2.2 How learning in neural net algorithms happen	第29-30页
3.2.3 Learning using Deep RNNs	第30-31页
3.2.4 Gating mechanism in LSTM to create context awareness	第31-34页
3.3 Features engineering using doc2vec modeling approach	第34-38页
3.3.1 Doc2vec modeling	第35-37页
3.3.2 Necessity of a doc2vec model in language modeling	第37-38页
CHAPTER 4 IMPLEMENTATION OVERVIEW	第38-51页
4.1 Introduction to Behavior modeling (BM)	第38页
4.2 Functional requirements for the study experiments	第38页
4.3 Experiment setup analysis	第38-51页
4.3.1 How to prevent the Vanishing Gradient Problem	第38页
4.3.2 Theano Library	第38-40页
4.3.3 Data collection, Features Analysis &Preparation	第40-42页
4.3.3.2 Features engineering for inputs preparation	第41-42页
4.3.3.3 The Importance of word vectors in a vector space model	第42页
4.3.4 Keras- deep learning library and RNN Model	第42-43页
4.3.4.1 Distributed representation Words in Doc2vec Matrix	第43页
4.3.4.2 Extracting inputs from the doc2vec model	第43页
4.3.5 The Algorithm Design and Architecture	第43-46页
4.3.5.1 Training	第44-45页
4.3.5.2 Input layer	第45页
4.3.5.3 Output layer	第45-46页
4.3.5.4 Objective function activation using Softmax	第46页
4.3.5.5 Compiling and fitting the model	第46页
4.3.6 Choice of Efficient parameters	第46-50页
4.3.6.1 Training modes	第47页
4.3.6.2 Initialization in layers	第47页
4.3.6.3 Activation function	第47-48页
4.3.6.3 Adding Dropout between layers	第48页
4.3.6.4 Batch Normalization	第48页
4.3.6.5 Choice for model optimization	第48-49页
4.3.6.6 Choice for the Batch size	第49页
4.3.6.7 Using Callbacks	第49-50页
4.3.7 Prediction inference	第50-51页
CHAPTER 5 S A CASE STUDIES, BENCHMARK AND RESULTS	第51-68页
5.1 Introduction	第51-53页
5.1.1 People's Data Acquisition with a web application	第51-52页
5.1.2 Steps followed to realize a case study experiment	第52-53页
5.2 Movie Reviews Case study 1-The IMDbModel	第53页
5.3 Dataset Source	第53-58页
5.3.2 Pre-processing and Data cleaning	第53-54页
5.3.3 Feature vectors engineering through Doc2vec Modeling	第54页
5.3.4 Learning features similarity via the distributed memory(DM) words modeling	第54-58页
5.3.5 Inputs vectors and formatting	第58页
5.4 Classification with Keras	第58-63页
5.4.1 Extract the Training and testing vectors from the doc2vec vector space model	第59-60页
5.4.2 Initialize a new model	第60页
5.4.3 Adding hidden layers to the model	第60页
5.4.4 Model compiling and parameter tuning	第60页
5.4.5 Performing Training with Keras classifier algorithm	第60-61页
5.4.6 Performance results and model evaluation	第61页
5.4.7 Generating visualizations	第61-62页
5.4.8 Perform output predictions	第62-63页
5.4.9 Using the Model for deployment	第63页
5.5 USA Travelers'Airlines sentiments Case Study 2-The TwitterDataModel	第63-65页
5.6 Benchmarks setup and Baselines results	第65-68页
5.6.1 kNN classifier	第65页
5.6.2 Random forest classifier	第65-66页
5.6.3 Passive-Aggressive Classifier	第66页
5.6.4 Benchmarks Setup and results	第66-68页
CHAPTER 6 EVALUATION, ANALYSIS AND BASELINES COMPARISON	第68-80页
6.1 Metrics presentation and Evaluation	第68-69页
6.2 Algorithms evaluation	第69-76页
6.2.1 Training time	第69页
6.2.2 Accuracy and Loss Measures	第69-70页
6.2.3 Effect of increasing the number of epochs	第70页
6.2.4 The Existence of unbalanced training sets	第70页
6.2.5 Confusion Matrix (CM)	第70-71页
6.2.6 ROC Curve and Graph	第71-73页
6.2.7 Accuracy and Loss response curves	第73-75页
6.2.8 Visualizing predictions and class probabilities	第75-76页
6.3 Summary discussion from visual graphs	第76-77页
6.4 Evaluation of Baselines＆Performance Comparisons	第77-78页
6.5 Challenges/Limitations encountered	第78-79页
6.6 Deployment	第79-80页
CONCLUSION AND FURTHER SUGGESTIONS	第80-82页
Conclusion	第80-81页
Suggestions on future works improvements	第81-82页
REFERENCES	第82-84页
Appendix A During the period of Study for a Master's Degree	第84页