Lately, I have been working on a new interesting project that utilizes NLP tools and resources in Slovak language to bring a real value to customer. So far it has been quite challenging, since there are very little resources, corpora and tools available. I have decided that it is best to investigate and find all the usable NLP tools and resources (somewhat usable for slovak language) on the internet and compile them together on github and release my tools as open source so we can move forward together. You can find the links here:

Feel free to contribute and link to your source codes, live services or corpora as pull request.

The rant

I have a hypothesis that the state of NLP in Slovak language is so bad mainly due to the ignorance from IT industry and the lack of the potential on Slovak market and also due to the nature of our nation and our way of thinking. If we look to our fellow neighbors Czechs, we can see a lot of resources, corpora and tools. I am suspicious that they can handle machine processing of Slovak language better than we do. Again with my hypothesis – in Czech Republic they didn’t use Google as their primary search engine, but they used Seznam that is driving the NLP industry forward, since many tools and resources that are available are linked directly to Seznam or people working at Seznam. Also Czechs are much more open about stuff than we are, they don’t mind sharing their work and that propels their research in this field even further.

