Continuing with this short series of twitter analysis, today I decided to harvest 500 tweets talking about bioinformatics in order to investigate the retweets connections. My question was very simple: who retweets whom? To answer it we can use a network to visualize who is retweeting whom (here is the code in R). The direction of the arrows (eg @user2 -> @user1) indicates that @user2 retweets @user1. This is the result:
162 out of the 500 collected tweets contained a retweet entity. Doing a quick frequency analysis, the top eight more retweeted users were:
- genetics_blog = 26 (16%)
- bffo = 14 (8.6%)
- phylogenetics = 8 (5%)
- dullhunk = 8 (5%)
- JChrisPires = 8 (5%)
- brent_p = 7 (4.3%)
- michaelbarton = 7 (4.3%)
- pablopareja = 6 (3.7 %)
You can identify these users in the graph because they are located in the center of those groups with a star-like shape.
I had a question in mind: What do people tweet about genetics and genomics? After searching for those terms in twitter, I decided to harvest the results (1700 tweets) and produce some graphical output that allowed me to see the main and most frequent associations (here is the code in R). And this is what I got
I color-coded the terms with the help of some cluster analysis. Roughly we can classify the results in 8 groups. In red we have topics such as bigdata and machine learning. The dark green cluster is related to the WLSA (wireless-life sciences alliance). The light green group comprises the medical and health tweets. In brown we have the cluster of the American Heart Association clinical series. The pink cluster refers to the Genetics and Genomics Section Meeting in the Marriott Marquis hotel. The more interesting clusters are the orange, purple and yellow ones about the molecular, biological and breast cancer associated topics, respectively.