Using R for content analysis of short documents.

Recently I was trying to extract structure from a large corpus of documents. Nearly all the documents were short, many were just notes of one or two lines in length. Regular approaches to clustering do not work so well here. Nonetheless after doing some research I found a suitable method that I was able to apply on the data using the statistical programming language R.

