When data analytics meet bollywood

a drishyam case study!

Posted by Ankit Shukla on August 8, 2015

Having worked on social applications for some bollywood movies in the past and coming from an IT+Marketing background, I always wondered if I can marry geekey data analysis and b-town entertainment together.

So I choose a recent bollywood creation Ajay Devgn and Tabu starrer "Drishyam" and got into my new side project, which could keep me awake at night and got me thinking while I am traveling.

Data source and Volume

Having worked on various social APIs since a long time I decided to collect social data across the movie, so I immediately headed to twitter and used its streaming API to collect all the tweets around the keyword 'Drishyam'. I started the automated process of tweet collection on 24th July Night and stopped collecting the tweets on 6th August, which symmetrically covers the post and pre release period of the movie. [I have used only one platform(twitter) and one Keyword (Drishyam) to collect data to keep the process simple and less data intensive](Data like this is publicly available to everyone through twitter API)

When I started the data processing work I had a whooping number of tweets waiting for me.

Total Number of tweets from 9 PM 24th July to 1 PM 5th August : 2,67,102.

Total Number of unique user accounts who tweeted : 18,764.

Available data set was tweet text, time of tweet and user-handles from which the tweets were posted, following is the story of how I used this simple data to produce vital analytics about the movie on twitter.

Date wise distribution of the buzz

To calculate the trend of buzz, I decided to plot the number of tweets posted against the date. After applying some simple string manipulation functions on the time-stamp string I had the following plot.

Data is from 9pm 24th July to 2 PM 5th August

Clearly we can see that there are three major peaks in the graph, one is 26-July, 29-July and 31st July when the conversations around the movie were at top. The reasons being on 26th July Ajay Devgn shared stage with Kapil Sharma for movie promotion on Highest TRP comedy show of India, Comedy Nights with Kapil. The second peak corresponds to 29th July when several contests were held to promote movie (we will see more on this in top-conversations section, coming below), the third peak corresponds to the release date when various movie critics released there reviews (More on this in top contributors section).
Inference : Promoting a movie on a popular TV show does help a movie in creating buzz and contests are sure shot ways to generate user interest and buzz.

Time wise distribution of tweets

To know at what time of day the conversations around the movie were at peak, I decided to draw a radar graph of time vs no. of tweets

Here we can see that conversations are at their peak at night between 7 PM to 10 PM, and thats also the time when most people are active on twitter. Secondly the conversations slower down from 10 PM and reach their minimum at 4 AM after which it starts growing till 10 PM, their is an increase in conversation between 2 PM and 3 PM which is lunch time in organizations and colleges and people move again to social networks hence creating buzz around what they like.
Inference : Its easy to build trends during late night and early morning but the reach of trends is minimum. Best time to tweet is during lunch time [1 PM to 3 PM] and Late evening when people come back from work [7 PM to 10 PM].

Top Conversations

To get a sense of popular topics around the movie buzz, I decomposed every tweet into its constituent words, which gave me a total of 50,00,000 + mentions of 75000 + unique words [> 2 character in length]. I calculated the frequency of each word and plotted top 10 and 20 meaningful words in the conversation, all time as well as date wise respectively.

Top 10 all time popular conversation topics

As apparent as it is, Ajay Devgn is the center of attention in conversations, followed by Comedy Nights with Kapil, Tabu and others, the movie grabbed initial attention and buzz mostly because of its star cast and promotion on small screen.

Important and popular conversation topics on 25th July

Important and popular conversation topics on 26th July

Apart from CNWC [comedy nights with kapil] with 35000+ mentions, we have following other popular conversation topics on 26th July

Important and popular conversation topics on 27th July

Apart from '4 days to Drishyam' with 7800+ mentions, we have following other popular conversation topics on 27th July

Important and popular conversation topics on 28th July

Important and popular conversation topics on 29th July

Important and popular conversation topics on 30th July

Important and popular conversation topics on 31st July

Important and popular conversation topics on 1st August

Important and popular conversation topics on 2nd August

Important and popular conversation topics on 3rd August

Important and popular conversation topics on 4th August

Important and popular conversation topics on 5th August

You need FANs, Die hard Fans

To drive such massive conversation any campaign or brand needs cult FAN following, Ajay devgn is a Mega Star versatile actor with a huge crew of fans around. In this post i have categorized FANs as users who have posted more than 100! tweets about Drishyam
Total Number of twitter FANS : 460
Trivia : 60% of all the tweets were posted by these 460 users!

Top 30 Fan accounts

Here is a plot of top 30 fans, user handles for users v/s number of tweets posted

Influencers : The most critical agents

While fans can generate a lot of content and buzz about the movie, they generally have a limited followers list. Influencers on the other hand are people with a large number of followers and even one tweet from them can help amplify the buzz many folds. I have categorized influencers as account with a follower count of 5000+.
Total influencers who tweeted about drishyam : 236
Total Reach (not necessarily unique) : 2,50,00,000 users

Popular 30 Influencers

[User Handle | No. of tweets | Followers]

  1. ajaydevgn 4 3809783
  2. htTweets 7 2250949
  3. abpnewstv 9 2072334
  4. liputan6dotcom 3 1976642
  5. ibnlive 48 1600525
  6. ZeeNews 8 1263979
  7. AmitShahOffice 2 1155845
  8. ArshadWarsi 5 1064484
  9. kamaalrkhan 53 1053159
  10. taran_adarsh 14 1045496
  11. PritishNandy 8 877246
  12. ColorsTV 11 644487
  13. khushsundar 2 505963
  14. Chinmayi 1 467049
  15. priyaguptatimes 44 373148
  16. Gotham3 6 292119
  17. ritesh_sid 8 287411
  18. boxofficeindia 23 262601
  19. Bollyhungama 31 251857
  20. pinkvilla 21 207890
  21. TOIEntertain 17 160174
  22. rajcheerfull 8 148258
  23. YuvaiTV 1 137611
  24. MovieKeeda 10 125783
  25. alyrazabeig 2 120136
  26. HHCGuiltFree 6 116676
  27. JagranNews 16 105469
  28. SRKFC1 2 102838
  29. SKsCombatant 47 87279
  30. AnimalRightsJen 5 79920

These account are worth more than gold for spreading a buzz on twitter, Next time when you make something awesome don't forget reaching out these online celebs to tweet about you. Other remarkable influencers like Arvind Kejriwal, Amit Shah etc. got the Drishyam buzz into print media as well.
The media is moving to multiple screens, small screen supports in building the buzz for movies and vice versa, and the forever accompanying screen of our smart phones make the conversations two way.

The movie doing great as seen from the positive popular words after the release, the current rating is 9.1 on IMDB!
If you really enjoyed the post and found it informative or you have a query do leave a comment below.

The aim of this post is to show how data can be important to predict and take remarkably accurate decisions even in a creative industry like Bollywood. Although I have only made like 20% of my explorations public in this post, the possibilities are endless.

Interested in more? I am available at ankit@in7h.com and live in Mumbai, This link is my facebook profile & this is my linkedin profile, lets connect the dots!