am experimenting with hadoop and the distributions of Hortonwork and cloudera in order to do some simple text analytics. All the examples I have found until now on the web regarding e.g. wordcount deal with only one column. But I have many text files on which wordcount must be applied and the results must be saved in a spreadsheet, each in a separate column. So I was wondering what is the easiest way to do text analytics with hadoop in conjunction with spreadsheets. The functions I need are:
- transform to lower case
- filter stopwords
- transpose results
- write to excel
Can this be accomplished easily with Pig or Rhadoop or something else?