首页--工业技术论文--自动化技术、计算机技术论文--计算技术、计算机技术论文--计算机的应用论文--信息处理(信息加工)论文--文字信息处理论文

Machine Learning Based Model for Detecting Similarity of Scientific Papers

ABSTRACT第5-6页
摘要第7-9页
ACKNOWLEDGEMENTS第9-10页
DEDICATION第10-16页
LIST OF ABBREVIATIONS第16-17页
Chapter ONE: INTRODUCTION第17-21页
    1. Background第17-21页
        1.1 Thesis Structure第19-21页
Chapter TWO: TEXT PREPROCESSING第21-29页
    2. Cleaning and Preparing Text Data第21-29页
        2.1 Removal of Punctuation Marks第22-23页
        2.2 Stop-word Removal第23-24页
        2.3 Stemming-determining the base form of a word第24-26页
        2.4 Lemmatization-determining the base form of a word using dictionary第26-27页
        2.5 Tokenization-extracting word tokens第27页
        2.6 Tagging-syntax highlighting第27-28页
        2.7 Text Chunking-grouping words第28页
        2.8 Parsing第28-29页
Chapter THREE: LANGUAGE MODELING第29-77页
    3. Methods of Language Modeling第29-77页
        3.1 Term Frequency-Inverse Document Frequency (TF-IDF)第29-32页
        3.2 N-grams第32-35页
        3.3 Singular Value Decomposition (SVD)第35-40页
        3.4 Neural Network Based Language Modeling第40-49页
        3.5 Convolutional Neural Network Language Models第49-57页
        3.6 Recurrent Neural Network Language Models第57-67页
        3.7 Word2vec-Vector Representation of Words第67-73页
        3.8 Glo Ve-Global Vectors for Word Representation第73-77页
Chapter FOUR: CLUSTERING TEXT DATA第77-91页
    4. Methods for Clustering Texts第77-91页
        4.1 K-means Algorithm第77-80页
        4.2 Hierarchical Clustering第80-81页
        4.3 Spectral clustering第81-83页
        4.4 Clustering using RNNs第83-88页
        4.5 Convolutional Clustering第88-91页
Chapter FIVE: IMPLEMENTING THE LANGUAGE MODEL第91-104页
    5. Clustering Scientific Papers Based on Word Vectors第91-104页
        5.1 Data Collection第91-92页
        5.2 Technical Specification第92-93页
        5.3 Cleaning and Preparing the Text Data第93-94页
        5.4 Creating the Language Model第94-95页
        5.5 Capturing Linguistic Similarity Between Papers第95-96页
        5.6 Results第96-104页
Conclusion第104-106页
REFERENCES第106-110页

论文共110页,点击 下载论文
上一篇:基于云环境的多用户高效安全排名查询研究
下一篇:基于旋转不变均匀模式LBP和Zernike矩的图像哈希算法