Current state of our home project

Quick overview

Last summer me and my cousin started to work on our idea to crawl the comments of the internet and harness the data from it. Our current goal is to help the marketers to get useful insights from our data about the effects of their campaigns, releases, and presence in the digital world.

For example, a Chinese brand releases new phones. How do they get info about their users feedback? Besides looking at the numbers of the sales, returned handsets, perhaps emails from the customers about the features not working.

Our quest is to solve this problem by having a huge dataset of comments, and by analysing this we can give useful insight about:

  • How positively their brand is perceived.
  • What do people think is positive/negative about certain products.
  • Yearly/monthy/weekly breakdown of the buzz around them on the internet, with channel distribution. (On which sites were the comments)
  • Did they share their own ideas or shared someone elses content about your product.
  • Conversational clouds, what did people mention in relation to your product (screen, charger, packaging)
  • Performance comparison with similar products. (Number of mentions Samsung vs Xiaomi)
Current state of the solution

Our application’s backend side based on ‘templates’/scripts which instruct the crawler service to get the comments of various forums currently a few tech sites, Ars Techinca, Xda developers we have 12M documets at the moment. Then the comments are analysed in microservices by their language, their sentiment, and the structure of the sentence. This info is saved into a database, then the data is indexed in an elasticsearch cluster so we can quickly query it.

The frontend (you can access it here) currently enables you to search the data and create some really basic pie charts based on keywords.

The architecture without the language detection and the sentiment analysis services looks like this:

The tech stack is:

  • Spring boot for microservices
  • Angular 1 + Bootstrap for the frontend
  • Mongo for storage
  • Elastic for indexing & querying
MVP

Things to add so we can demo it:

  • Create views which show where were keywords mentioned time/site/language. Sentiment of keywords. Conversational clouds. Basically anything statistics what delivers value with little as possible development time.
  • Add user/group/organisation management so only people with certain right can access the data and generate reports. (We planned to use Stormpath but their future is kinda shady)

Our current goal is to deliver the MVP in 2-3 months and get feedback from customers as fast as possible so we can make sure we are heading in the right direction.

 

Thanks for reading, we appreciate any feedbacks/ideas in the comments or in an email.

Best lightweight GIT service for your Raspberry/SOHO server

Not a paid ad.

Finally found what I’ve been looking for, GITEA. Its love at first sight, not only because of the beautiful UI, but because it doesn’t use SO MUCH GODDAMN MEMORY which is expensive in the cloud for such a mundane thing as version management. For the past years, I have had gitlab/bitbucket/stash servers for my personal projects but they used too much memory, considering that the server was used only by 2 people tops (gitlab recommends 4 gigs, runs with 1 gb ram + 3gb swap). The problem with them is that they are written in Java and designed for massive scalability, on the other hand, gitea is a lightweight go service forked from gogs, consuming ~30MB memory with light usage. Also, it is blazing fast, has a great ticket management and a built-in wiki. Its almost as good as Github.

You can get it here: https://gitea.io/

It’s also fairly quick to set up:

  • clone it
  • create a user for it
  • register it as a service (it’s only a single binary so not necessary)
  • edit the config file

GOD BLESS the great guys who designed golang, so people can write efficent applicantions and don’t have to bother with C and memory allocation anymore.