Up to now, the author's experience in modeling mainly includes e-commerce reviews - fake e-commerce review detection. For example, when we shop online, we will refer to the ratings and evaluations of their products, and learn from other people's buying experience. Whether or not to purchase is a weighting, and therefore many unscrupulous merchants, in order to increase their popularity and consumption, will maliciously brush up positive reviews and misguide consumers.
In order to have a more detailed understanding mobile number list of its operation mode, the author once went undercover once - wrote a good review for a certain clothing seller, followed up on and off for half a month, and found that its transaction volume has increased steadily, specifically false transactions, It is still the influence of word-of-mouth caused by fake transactions to drive real consumption, so we will no longer explore it. Therefore, the author believes that the detection of fake reviews has a certain positive moral significance. In e-commerce reviews, the data obtained is textual data, which can be extracted by topic-word extraction through traditional topic models and other machine learning methods, and then related core topics can be obtained for further modeling.
At the same time, compared with numerical data, text data cannot be directly read by a computer and needs to be converted into a numerical format. Here are some methods for reference. The LDA topic model can obtain the probability value of each word, although limited Specific topics and specific texts, but can also be used to quantify texts to some extent; Word2vec a text vectorization package published by Google, can vectorize texts, but Word2vec is more of a bag-of-words model There are certain deficiencies in the context linkage for processing the words in the text, which leads to the emergence of Bert, another more powerful word vectorization tool.