- SourceAnalysis, statistics on which channel did those twitter come from.
- UserAnalysis, analysis the most Active user, the most Popular user, etc. (top 10 ,etc.)
- TopicAnalysis, Statistics on what's the most hot topic going on the twitter.(top 10,etc.)
- Follow the struction on https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster to set up a storm cluster;
- Configure your execution environment: copy the the jar files specified in the .classpath file to your $STORM_HOME/lib;
- run command:
storm jar target/twitterStreamAnalysis-0.0.1-SNAPSHOT.jar TwttrStrmAnlyst.StreamAnalysisTopo twitterStream
this program will download twitter messages form Twitter stream API, and save to this folder named "YYYY-MM-DD-HH", means that it will generate each file for each hour.
- At the same time , this program will generation 2-minute intervaled "mostActive","mostPopular" user statistics in this folder.
- If your country or district was blocked to access facebook, twitter, youtobu and other web medias, you need to follow the instructions to circumvention the firewall of your ISP provider. (It made me a lot trouble since twitter was offically blocked by Chinese government for political reasons. However I managed to start a VM on Amazon EC2 in Tokyo.)
- A real time running storm cluster control panel can be reviewed from: http://54.248.240.232:8080/
- A visualized view of the statistics can be found : http://210.75.252.106:8888/twitterAnalysis/ (compatible with Chrome & Firefox)