Netflix’s geek-chic: how one company leveraged its big data to change the entertainment industry

by Tricia Jenkins

In early 2015, Netflix boasted over sixty million subscribers in fifty different countries and announced its plans to be operational in every single country by the end of the following year. The company, as its name suggests, offers its members a large array of film and television content, through either a DVD mail-subscription service or a digital streaming membership for a low monthly cost. While its DVD library is certainly more exhaustive than its digital one, Netflix’s streaming service still offers its members thousands of different titles depending on the region and is so popular in North America that in 2014 it accounted for 34% of all traffic using downstream bandwidth (Govind).

If there is a thesis that summarizes Netflix’s history and development, it is that the company has always been a dangerous game changer for its competition in the entertainment industry. First, its Internet-based movie rental service challenged brick-and-mortar stores, eventually leading to the demise of its primary nemesis—Blockbuster. Now, Netflix is posing a threat to subscription television networks, such as Showtime and HBO, as it has recently begun to commission original content similar in quality and style. However, Netflix Originals have the added advantage of being released an entire season at a time, as opposed to the linear schedule that its competition uses—a major plus for the on-demand generation who loves to watch as much as it wants, when it wants. Furthermore, plans are already in the works for Netflix to challenge movie theaters as the dominant exhibition hub for new release films as the company continues to strike exclusive deals with major Hollywood talent.

A second thesis about the company’s growth and success, however, might consider Netflix’s unusual focus on big data. By definition, big data is data that surpasses the processing capacity of conventional database systems and is received at such a high volume that it requires an elaborate system of collection and analysis to fully understand it. Using programs such as Hadoop, Pig, Python, Cassandra, Hive, Presto, Teradata and Redshift, Netflix is able to process 10+ petabytes of data along with 400+ billion new events on a daily basis in order to learn about its users’ viewing habits (Wylie). This data contains numbers about how many people are using the service at a particular time, what texts those viewers are watching, how viewers rate the programs they’ve watched, and what sort of device they’ve accessed Netflix from (there are over 1,000 options). The company, however, also logs a plethora of surprisingly minute and individualized activity. It tracks at what precise point in a film a viewer paused or stopped watching. It tracks when a viewer fast-forwards or re-winds through a scene, and through algorithms, its data scientists can learn what kind of scene that was (something sexy? something violent? something starring Nordic hunk Alexander Skarsgard?) It also tracks what types of programs subscribers watch on different days of the week, what programs are popular in particular zip codes, and what color browsing poster a viewer is most likely to select from the home recommendation screen.

From Netflix’s standpoint, the collection and analysis of big data is key to its business success. The company depends on being able to deliver a highly personalized recommendation system to its subscribers and to make well-informed content acquisition decisions. But its dependence on big data has also significantly changed the way that media texts are consumed, produced, exhibited, and valued, causing profound disruptions in the entertainment industry. But surprisingly, little academic attention has been paid to the role of big data in the entertainment industry. Part of the problem is that companies like Netflix protect their proprietary data by restricting its access to those who work inside of the company (with the exception of the Netflix Prize, as discussed later). Indeed as Boyd and Crawford write, most media companies prohibit access to their data sets by non-employees while others will sell some data for a fee.

“This produces considerable unevenness in the system: those with money—or those inside the company—can produce a different type of research than those outside. …."

"[T]hose without access can neither reproduce nor evaluate the methodological claims of those who have privileged access (673).”

In other words, because Netflix refuses to release its algorithmic formulas and data sets, academic researchers have had very limited access to data about the company’s viewers so as to evaluate how viewer behavior is affecting industry shifts. That doesn’t mean, however, that nothing can be known about relations between big data, Netflix, and industry changes. Indeed, by drawing on tech reports, recent literature, interviews and public talks with Netflix and other industry executives, I explore here big data’s role throughout the major stages of Netflix’s corporate development. My goal is to understand how data collection has helped the company disrupt almost every aspect of the media production cycle and to assess what the potential effects on the industry might be. While my research is necessarily limited by Netflix’s data restrictions, I nonetheless attempt to provide a foundational production-economy perspective on one of today’s game-changing entertainment companies.

In the beginning:
big data, red envelopes and Cinematch

Netflix was the brainchild of two Silicon Valley business partners, Reed Hastings and Mark Randolph, who wanted to create “the next Amazon.com of something” in the late 1990s (Keating 13). To many the duo seemed an unlikely pair in the business venture world, with Hastings possessing a colder “supercomputer mind” that thrived on perfecting business plans, while Randolph excelled as a more charismatic leader who used his talents to sell his products to others (15).

Eventually, the pair agreed to develop an Internet-based, movie subscription service that rented DVDs to customers through the mail in its now famous red envelopes. To first get their start-up off the ground, however, Hastings and Randolph had to figure out how to convince customers to abandon their local, familiar video store for one that only existed in cyberspace and took several days to deliver a movie. Their plan revolved around four basic principles: boast the largest selection of videos in the world, let subscribers keep their movies as long as they wanted without ever suffering a late fee, deliver products reliably and quickly, and eventually, create a highly personalized recommendation system that could outperform the video store clerk.

Hastings’ penchant for computer algorithms also meant that big data would be central to Netflix’s business plan from the very start. For example, when the data revealed that Netflix’s ability to deliver a film to consumers within 24 hours had an undeniable correlation to an increased rate of new customer sign ups in that same area, it dramatically changed its distribution plan. Rather than continue to ship all of its DVDs from a single warehouse in San Jose, California, the company developed a software program that plugged in each of its customers’ addresses to see where it should build not just one, but multiple distribution centers to allow for most subscribers to get their DVDs within 24 hours of Netflix shipping the order. At the time, the multi-center fulfillment structure was somewhat radical in the business world, but it ultimately proved a success as the faster delivery times eventually led to word-of-mouth advertising and a reduced cost associated with signing up new subscribers (Keating 56-57).

Such meticulous attention to shipment speeds explains why Gian Gonzaga, Director of Data Science at Netflix, argues that during the mail-order stage of the company’s development, it wasn’t really an entertainment company:

“it was a logistics company that just happened to sell entertainment.”

Its primary business questions centered around which titles it needed to stock, how many of those titles it needed in its inventory, and what the best way was to get those titles to the right person in the right amount of time.

But within three years of going live, Netflix realized that shipping speeds and a growing catalog would simply not be enough. When Netflix first started, it did not provide its viewers with a recommendation feature. All it offered was a search engine that could locate films by keywords, provide links to movie ratings and synopses, and allow users to enter a favorite film in order to find similar titles. When Netflix’s catalog offered less than a thousand titles, this system worked reasonably well. But as their catalog grew, Hastings said a recommendation system became “critical” because “people have limited cognitive time they want to spend on picking a movie” (Thompson). The company also knew that it would need to create an engine that could outperform the video clerk, who was able to give customers ideas about what they might like to watch in order to keep business active.

To do this, Netflix started by asking subscribers who rented a movie to rate it and then used this data to make future recommendations by comparing highly rated films to other ones in their catalog with similar attributes (such as genre, theme or talent). The only problem, according to Hastings, was that this early system gave “a mix of insightful and bonehead recommendations” and ultimately proved ineffective at predicting customer preferences (Thompson). For instance, its engineers could not figure out why Pretty Woman and American Gigolo rarely appealed to the same audiences “even though both were movies about prostitution, starring Richard Gere, and set in a major U.S. city” (Keating 191).

To improve their recommendation system, the company turned to its big data engineers who began to cluster customers together who rated movies similarly and to then present “films highly rated by cluster members to others in the same cluster” (62). By doing this, Netflix’s recommendation system, called Cinematch, could tease out some fairly nuanced connections that few Blockbuster video clerks could ever know. For example, it found that people who enjoy the historical war movie The Patriot also tend to like Pearl Harbor, but it also discovered that those very same folks also like the science-fiction movie I, Robot and the emotional drama Pay It Forward (Thompson).

In 2006, Netflix announced a contest that offered one million dollars to anyone who could improve its Cinematch system by 10%. 

To make further advances, Netflix placed even more trust in the power of data scientists when it announced an open-door contest in 2006 that offered one million dollars to anyone who could improve the Cinematch system by 10%. In the Netflix world, that translated into consistently predicting a subscriber’s movie ratings to within one half to three-quarters star on Netflix’s five-star system (Keating 187). Over 40,000 teams or individuals from 186 countries joined the three-year contest, using Netflix’s data set of one million subscriber movie ratings to test their equations (188). In the end, a team called Belkor’s Pragmatic Chaos took home the grand prize; they, like the other data miners, had begun to look at relations between movies and viewers in more nuanced ways. While Netflix never actually implemented their final algorithm, the contest produced valuable insights into viewing behavior and algorithm construction. For instance, the competing teams used the data sets to see if subscribers were more generous when rating movies on weekends rather than on weekdays and what effect rating a lot of movies at once had on the process. They were also able to demonstrate that subscribers who liked, say, Woody Allen films, tended only to care for a certain type of his films and did not recommend his other works. Interestingly, they also showed that a small subset of films (that were usually ironic or polemic in nature) were simply beyond reliable predictions. (The most troublesome film proved to be Napoleon Dynamite, as subscribers were sharply divided on whether that film was the product of creative geniuses or mass-produced crap; no model generated was ever able to reliably predict why people would rate it the way they did). (Keating 193)

 Today, Netflix’s recommendation system is even more sophisticated and, according to Kelly Uphoff, Director of Experimentation and Algorithms for Growth and Targeting at Netflix, it is now responsible for generating 75% of all movie and TV choices that users make on the site. In part, the system has grown more sophisticated since movies are no longer just tagged by actors, directors, settings or genres. Netflix now hires independent contractors to watch collectively every movie in its catalog and at least three episodes of every TV series. These reviewers then pick from more than 1,000 tags to describe the texts they’ve watched, including its genre, setting, time period, sexual suggestiveness, gore, romance level, mood, plot conclusiveness and even the protagonists’s moral integrity. Netflix then uses these tags to classify its films into micro-genres that are sometimes so specific they border on the absurd. In fact, in early 2014, the company had already generated 76,897 possible micro-genres to recommend to viewers, such as Emotional Fight-the-System Documentaries, Period Pieces About Royalty Based on Real Life, and Children and Family Movies Starring the Muppets (Madigral). (To see the micro-genres Netflix has generated, you can log into your account and then type in the following address: https://www.netflix.com/browse/genre/1. The 1 at the end of the URL is the number that corresponds to a particular micro-genre. In this case, the 1 will link to "African-American Crime Documentaries,” but changing the number at the end of the URL will change the genre it links to. For example, agid=76102 links to “Gritty Zombie Horror Movies.”)

 As Alexis Madrigal writes, such a detailed tagging system, when combined with millions of users’ viewing habits, “becomes Netflix's competitive advantage.” This is because the company's main business goal is to gain and retain subscribers, and recommending precise genres to people on their home screen is “a key part of that strategy.” In fact, the company’s data shows that member retention positively increases when it places the most tailored or specific rows of genre recommendations higher on the user’s home screen instead of lower (Madrigal).

A screenshot of a subscriber’s Netflix home screen, showing the types of genre categories Netflix constructs for viewer recommendations. 

In total, the sophistication of Netflix’s Cinematch system, which was driven by the complex analysis of big data sets, helped revolutionize the way people rented movies. The company’s ability to help viewers find films they wanted and to deliver them overnight lured millions of subscribers away from the video store where so many others had failed. Indeed, the arrival of Netflix is largely credited with the demise of Blockbuster, the once prolific chain of brick-and-mortar video rental stores. Its costly retail space, poor customer service, late fees and limited selection simply could not compete with Netflix’s lower-cost model, broader catalog, and highly sophisticated recommendation system. Blockbuster was also too slow to recognize the threat that Netflix posed. With the early failure of Hollywood Video’s e-commerce site, Reel.com, Blockbuster believed that the online video rental business was not a viable threat. “With this reassurance, Blockbuster remained conspicuously slow in its own attempts to develop online strategies” and remained reluctant to invest in a rent-by-mail business model (McDonald). By the time Blockbuster launched its own digital streaming and online rental service in the mid-2000s, a limited marketing budget, long waits for popular DVDs to become available, and a hastily configured distribution system that slowed delivery times crushed customer retention rates (Keating156). In December 2004, Blockbuster also made the “fatal” mistake of eliminating all of its late fees in order to build customer rapport and to directly compete with Netflix, but in the process the company forfeited approximately $400 million in late fees during a cash-strapped period—a financial hole from which it never recovered (McDonald). In the end, the corporate Goliath simply could not compete with the smaller, nimbler David, and it finally filed for bankruptcy in 2010.

In December 2004, Blockbuster made the “fatal” mistake of eliminating all of its late fees, in order to directly compete with Netflix. The move cost the company approximately $400 million in late fees during a cash strapped period — creating a financial hole from which it never recovered.

Revolutionizing entertainment in the streaming age

While Netflix’s Cinematch would eventually make reliable recommendations, it also allowed the company to mask some of its weaknesses. If a popular title was low in stock, for example, Cinematch would stop suggesting the title until more copies of the DVD became available. The problem of stocking the right of amount of inventory to meet viewer demand was always a challenge. It became exacerbated when Netflix entered its second and third stages of corporate development, which occurred when it launched a video streaming service in January 2007 and began developing its own original content in 2012. These new endeavors profoundly changed the company’s business model by shifting its focus away from that of a logistics company and more towards that of a media channel and studio. As a result, Netflix had to think much more carefully about what its subscribers wanted to watch, which titles it should license for streaming, and which titles it should create.

To better understand the challenge of this shift, consider that during its mail delivery age, Netflix was able to simply purchase physical DVDs, which in essence bought them a license to use those discs in perpetuity. So long as the disc didn’t break, the company could send that movie out to as many people over the course of as many years as it liked. Netflix also did not need to strike complicated deals for its DVD purchasing practices because it was allowed to purchase a disc as soon as it was released to the public, no matter who released it.

Streaming licenses work much differently because they operate under the assumption that (in this case) everyone with a Netflix subscription can watch the title, making a single streaming license for a single work dramatically higher in cost than a set of DVDs (Gonzaga). For example, when Netflix agreed to purchase the streaming rights to all seven seasons of AMC’s dramatic series Mad Men, it paid out nearly $1 million for each of the show’s 92 episodes (Nordyke and Rose). Given that one can purchase all seven seasons of Mad Men on DVD for roughly $70 dollars, Netflix could have purchased over 1.3 million box sets of the series (or roughly 260 million individual DVDs) for the same price without ever having to worry about those discs disappearing from its catalog once the license expired.

 In addition to the higher price tag, studios are also much pickier about whom they will release streaming rights to. Also, they tend to sell groups of products together in exclusive deals as opposed to just singular titles. Thus, users in the United States may notice that Netflix carries most Showtime titles in their streaming library, but carries almost no HBO titles, which are only streamed through Amazon Instant Video or HBO Go, even though Netflix can offer titles like HBO’s Game of Thrones through its DVD service. The higher costs of streaming licenses and companies’ preference to bundle titles in exclusive deals means that Netflix’s options are significantly more limited, presenting the company with the challenge of satisfying many different types of viewers with a more narrow catalog since it simply does not have the budget to purchase as widely as it does in the DVD market. To help them navigate these new challenges, the company, of course, still relies on its data scientists, but their job is now perhaps slightly easier.

Using big data for content development

One of the best things about the streaming era from a big data perspective is that users are now doing everything online and every interaction with a site can be logged, tracked and analyzed. In fact, Netflix’s data scientists now argue that people don’t really need to rate movies anymore in order for it to make accurate recommendations for them. According to Gonzaga, the company can now monitor

“what shows and movies you watch and how you watch them to figure out which selections were memorable and how to duplicate that experiences with [other] films available in their streaming library.”

In other words, the company can now analyze a viewer’s behavior to understand that she enjoys watching comedies on Tuesday nights and dramas on Saturday evenings, that she rewatches movies featuring Jodie Foster, suggesting Foster is a favorite actor, and that she stops watching films once high levels of nudity appear. It can then use that data to make reliable recommendations on different days of the week, accordingly.

Such a nuanced level of understanding of its users also allows Netflix to analyze large-scale viewer preferences and use that knowledge to purchase texts that best match subscriber demands – either through content licensing or commissioning original content. The use of big data sets, Gonzaga notes, “set us up to have more success in product development” since, for instance, the company can see when we “have already licensed all the good WWII documentaries and [when] our users have burned through them.” Thus, he stated, when the data scientists see there is a gap in the catalog between what we [Netflix] offer and what our viewers want to watch, we now have the power to actually go and create our own content to fill that gap and feel confident that it will find an audience.

Making a WWII documentary, however, is not as financially risky as many of the deals that Netflix has recently engaged in, some of which have seemed foolish at worst, or radical at best – but all of which are changing how film and television series are made. Take for example the company’s recent decision to sign Adam Sandler and his Happy Madison Productions to an exclusive four-year, four-movie deal. Historically, Sandler has been known for his comedies that employ adolescent and physical humor, with his top grossing films including The Waterboy (1998), Big Daddy (1999), Anger Management (2003), and The Longest Yard (2005). But n the last decade, many of Sandler’s films have been considered box office flops, including Blended (2014), which grossed just $46 million over its lifetime, That’s My Boy (2012) with $36 million, and Men, Women and Children (2014) with a very paltry $705, 000. Because of the downward trajectory of his box office pull, the decision to choose Sandler as one of Netflix’s cornerstones of its content development strategy perplexed many.

Nonetheless, Ted Sarandos, Netflix’s Chief Content Officer, believes that the deal is a smart one. Sandler has put out nearly one film each year for the last twenty years, most of which are in Netflix’s catalog. As such, the company had a wealth of data showing Sandler’s movies are “uniquely” popular on the streaming service across all international markets and that his movies tend to be repeatedly watched by subscribers (Kilday). Box office data also shows that while Sandler’s most recent films have tanked domestically, they still have significant global appeal. Sixty percent of Blended’s revenue, for instance, came from overseas, and Pixels (2015), his most recent release, is likely to perform similarly. Sarandos points out that as Netflix grows internationally, it needs to strike content deals with stars who appeal transculturally. Thus, while conventional wisdom holds that U.S. comedy as a genre does not travel well across cultures, in the specific case of Adam Sandler, that data is wrong (“NATPE 2015”). Sarandos believes that his deal with the star will serve Netflix’s service and it emerging markets particularly well. This logic is further augmented by the fact that Netflix will own the rights to each film, meaning the films will remain in its catalog without ever being subject to an increase in price or a license expiration date. Netflix may also sell those movies on DVD or Blue Ray if it ever sees fit.

While it is highly unlikely that Sandler’s films will win Netflix critical awards, the company now has over sixty million subscribers, making it clear that it has the muscle to strike exclusive content deals with prolific stars. This practice may shift feature film development away from traditional studios and theaters. Indeed, the Sandler deal was intentionally designed to upset how films are released because, as Sarandos notes, we now live in an on-demand society that can consume content when and where we want, but the one exception has been movie going. He states that viewers simply do not have enough weekends to watch every movie that is released in theaters. But home viewers still have to wait, on average, between 6 to 12 months before they can watch the content they missed at home. This “antiquated” system of windowing, Sarandos states, works against consumers and Netflix. So in order to change when and how movies can be watched, he had to develop a product that he could control start to finish (“Keynote”).

Consequently, theaters are scared by the threat Netflix poses. When the company announced its very first movie deal in 2014—a sequel to Crouching Tiger, Hidden Dragon—it also revealed its plans to premiere the film on the Netflix site at the same time it would be released in the IMAX format in theaters across the globe. Regal, AMC, Carmike and Cinemark quickly refused to show the sequel on their IMAX screens, as did Canada's Cineplex and Europe's Cineworld. The IMAX film company seemed content to screen the movie mostly in China, where Variety said it would have over 200 locations by the time the movie was released (and where Netflix did not yet offer service), but theatre owners’ refusal to play day-and-date releases is telling (Lang). While Sarandos admits that he struck the deal with IMAX to brand Netflix movies as “big” gorgeous films, rather than small, made-for-TV-movies (“NEXT”), theaters sensed that Netflix’s ability to strike day-and-date deals set a very dangerous precedent since it cuts theater viewers in potentially significant numbers. In order to deter film producers like the Weinsteins from striking these deals with Netflix, movie theatre owners can only boycott the films in order to protect the theatrical screening status quo. But as the Sandler deal demonstrates, Netflix has already found a way to circumvent even an industry-wide boycott by bypassing the theatre system altogether.

As best exemplified by its original television series, House of Cards (2013), Netflix is upsetting the television industry in similar ways. Directed by David Fincher, House of Cards is essentially a modern remake of a BBC series, starring Kevin Spacey as Frances Underwood, a rising politician in the cut throat world of Washington D.C. politics. Fincher had pitched the series to HBO, Showtime, AMC and Netflix, with the latter outbidding its competitors by offering to buy two seasons of the show for $100 million dollars without ever seeing a pilot. In essence, House of Cards was the first television show “developed with the aid of big data algorithms” (Satell). Those algorithms indicated that a healthy share of Netflix viewers had already streamed Fincher’s work from beginning to end, that Kevin Spacey films had always done well on the site, and that many viewers enjoyed the British version of House of Cards.

“With those three circles of interest, Netflix was able to find a Venn diagram intersection that suggested buying the series would be a very good bet on original programming” (Carr).

$100 million dollars, however, is a lot to invest in a project that hasn’t even made it to the pilot stage, and Netflix’s decision to purchase it sight-unseen marked a significant and radical disruption as to how television series are usually developed. In the traditional model, a network will hear several pitches from producers, put up a limited amount of money to develop some of those pitches into pilots, and then choose which of those pilots to actually put on the air in series form. If the show does poorly, the network can cancel the show mid-season. It is a long-standing model designed to reduce the financial risk of television development. Netflix has bucked his system entirely, largely because of the way it wants to distribute original series.