Current state of our home project

Quick overview

Last summer me and my cousin started to work on our idea to crawl the comments of the internet and harness the data from it. Our current goal is to help the marketers to get useful insights from our data about the effects of their campaigns, releases, and presence in the digital world.

For example, a Chinese brand releases new phones. How do they get info about their users feedback? Besides looking at the numbers of the sales, returned handsets, perhaps emails from the customers about the features not working.

Our quest is to solve this problem by having a huge dataset of comments, and by analysing this we can give useful insight about:

  • How positively their brand is perceived.
  • What do people think is positive/negative about certain products.
  • Yearly/monthy/weekly breakdown of the buzz around them on the internet, with channel distribution. (On which sites were the comments)
  • Did they share their own ideas or shared someone elses content about your product.
  • Conversational clouds, what did people mention in relation to your product (screen, charger, packaging)
  • Performance comparison with similar products. (Number of mentions Samsung vs Xiaomi)
Current state of the solution

Our application’s backend side based on ‘templates’/scripts which instruct the crawler service to get the comments of various forums currently a few tech sites, Ars Techinca, Xda developers we have 12M documets at the moment. Then the comments are analysed in microservices by their language, their sentiment, and the structure of the sentence. This info is saved into a database, then the data is indexed in an elasticsearch cluster so we can quickly query it.

The frontend (you can access it here) currently enables you to search the data and create some really basic pie charts based on keywords.

The architecture without the language detection and the sentiment analysis services looks like this:

The tech stack is:

  • Spring boot for microservices
  • Angular 1 + Bootstrap for the frontend
  • Mongo for storage
  • Elastic for indexing & querying
MVP

Things to add so we can demo it:

  • Create views which show where were keywords mentioned time/site/language. Sentiment of keywords. Conversational clouds. Basically anything statistics what delivers value with little as possible development time.
  • Add user/group/organisation management so only people with certain right can access the data and generate reports. (We planned to use Stormpath but their future is kinda shady)

Our current goal is to deliver the MVP in 2-3 months and get feedback from customers as fast as possible so we can make sure we are heading in the right direction.

 

Thanks for reading, we appreciate any feedbacks/ideas in the comments or in an email.