We are looking for specialists to control outputs of our “artificial intelligence” which analysis thousands of text reviews for various products.

What it is

A large amount of unstructured text (reviews, expert articles, tests) is a subject to text analysis that allows us to identify the main topics discussed (product features) and the sentiment (how are these features rated by users).

The example below shows an analysis of 914 reviews in the Czech language and it is evident that users the most comment functions/modes of the camera (in total 86 opinions). At the same time, it can be seen that users best rate flip/touch screen (46 positive review and only 4 negative ones).

What do we need you for?

In the prototype, we only show aggregated outputs of the analysis which are preceded by feature identification and sentiment determination.

Our software doesn’t have to be right all the time and in order to improve our results, we need feedback from specialists in the respective areas.

We are currently seeking help primarily in the field of cameras and mobile phones. However, we will successively add more areas. If you are interested in our project, we will be happy if you let us hear from you and as soon as we start being concerned with your topic, you will get first-hand information and you might use our software right away ????.

Jiří Fuchs

We will inform you about the milestones we have achieved in analyzing text reviews. Let’s take a look at our research.

Motivation

Our team is currently working on a project to help make decisions about buying different products. A huge amount of opinions and reviews of individual products can be found on the Internet.
These user reviews are distributed across a variety of discussion forums, product rating sites, or specific portals. For a regular user, it’s difficult to find the information needed, get a look at them, and make its own opinion.

Methodology

In order to analyze large amounts of unstructured data, we have decided to use machine learning methods. We want to use these data to identify topics that are important to users and to determine their positive or negative attitudes towards individual product features.

Current status

We are currently working on creating crawlers for downloading user reviews and articles about the selected product group. These crawlers are tailored to the structure of specific sites. Crawlers from these sites get relevant data that can help in analyzing themes and attitudes. So far, we have created eight crawlers, which have helped us to download about half a million user reviews and expert articles about two thousand products in two languages ​​(Czech and English).

Problems solved

We had to deal with several issues when acquiring the data. One of the main ones is the different way of labeling products on different sites. Although it is an identical product, there are differences in names that complicate product pairing. Another problem is limiting the number of accesses to some sites in the form of code captcha. The last issue that needs to be solved is the changing web structure that causes crawlers to fail.

Conclusion

We have a practically closed first phase of the project in which we have defined the task of creating data acquisition tools for subsequent analysis. In the next phase, using machine learning methods, we will work to uncover the topics discussed and attitudes of users.

Jan Přichystal

In the introductory article of the Reviews section, we present the Multicriterial Text Analysis Software (MTA) project, which deals with the removal of information asymmetries in news and reviews.

The MTA team of scientists from CYRRUS ADVISORY, a.s. and Mendel University in Brno uses machine learning methods to analyze text in the field of current news and product reviews.

In the following posts, you can look forward to describing the research issues and the results we have already achieved in this area.

Jiří Fuchs