The Evolution of Data and the Explosion of Analytics: Harnessing the Most Powerful Resource of the 21st Century

A case for implementing “small data” revisions, “big data” conventions, predictive analytics, and process automation in Financials.

The actuarial ability of insurance is the most salient skill for its survival. Since the industry’s merchant ship beginnings in 2nd and 3rd century BCE to today, it has seen the inception point for many significant developments in analytics and modeling. Mostly analog throughout its history, it has no dearth of equations. From the Benjamin-Gompertz model for life insurance to the five-factor model of car insurance, it was built on equations, models, and data that helped explain the world on an aggregate level. For quite some time, that’s the best that analytics could hope for – to predict general trends and to develop rules that apply broadly. Take car insurance for example, it used to rely on only a small number of independent variables: age, gender, citation history, accident history, location, and vehicle make and model. Four factors were self-reported on the application and the others were readily available in public records. This model remained almost entirely unchanged since its impetus in the 1930s, stagnating for almost eight decades. Over time, loyalty offers and extended time without accidents entered into calculations, but no real changes took hold. This model might be good at predicting aggregate trends, but the individual accounts may contain large amounts of asymmetric information. Imagine a young, cautious driver who never speeds, now also imagine an older driver, like Nobel prize winning economist Milton Friedman who sped everywhere he went and thought of most traffic laws as useless.  In the traditional model, the young, cautious driver would pay more. Effectively, the younger driver was paying for the risks that Friedman incurred. The shift of risk and reward works on an aggregate scale, and if insurance was a monopolistic and mandated enterprise. In that case, there may not be any need to look further than these aggregate trends. However, insurance, like all industries, looks for competitive advantages, and the best way to attract new, lower risk clients is to offer something other companies can’t. In this case, it’s fair and effective pricing structures. Therefore, the model must change. It must ask newer questions that reach outside of the traditional data sources.

Insurance had to make the leap from small data, stored internally, to big data. To take a step forward, these companies needed to predict who was individually more at risk to gain a competitive advantage. Speeding, accelerating through intersections, running red lights, excessive caution and slowness, and propensity to swerve were metrics of driving that needed to be captured. However, the necessary insight didn’t stop there. Companies could gain additional value by seeing who checked into bars or restaurants on Facebook late at night, by looking to see if an owner of multiple cars drove each differently, or by looking at who texts and drives. All of these sources of data needed to be gathered through new initiatives from insurance providers, data feeds from sites like Facebook, and data agreements with diagnostic systems like OnStar. A prime example is Progressive’s Snapshot Pay as You Drive Program (PAYD) which passively collects driving data and phone usage data through an app. It will capture speeding, braking, distance driven, location data, phone usage, and whether you are driving during high-risk hours (12am – 4am). Taking this data and building out a predictive analytics model, Progressive can price on an individual level, gaining a competitive advantage over the traditional model. Big data like this has been impacting many insurers over the past two decades, and the industry has become an early adopter of the powerful insights it can provide. Progressive has reaped the benefits of being a first mover, from 2009 to 2018 it has grown its market share by 70% according to the National Associate of Insurance Commissioners (NAIC), passing giants like Allstate, Travelers, and Nationwide. The move from small data to big data, predictive analytics, and robotic process automation (RPA) is a strategy, and when deployed effectively can generate astounding results.

For many companies, their evolution with data didn’t begin until the 1980s when Manufacturing Resource Planning (MRP) and Enterprise Resource Planning (ERP) systems began to automate standard business processes. These user-facing, front-end interfaces created a consistent stream of data, making it readily available for the first time. Following the explosion of commercial off the shelf (COTS) systems, relational databases with enhanced analytical capabilities quickly followed in the subsequent decades. Throughout the 1990s and 2000s, many firms began to explore data, with one exceptional caveat – they focused almost exclusively on data generated by their own hands, i.e. their internal systems. Within that limited scope, each department in an organization tended to emphasize its own specific application needs and data management requirements. This has led to organizations focusing on managing the needs of a specific application over the global, organizational strategy to effectively manage their data. Subsequently, the breadth of organizational data has either been underestimated or mismanaged, leaving many organizations below their possible frontier. A strong internal foundation is key to fully actualizing the high-value capabilities of a coherent strategy like Progressive’s. However, there are many individuals and companies that are still struggling to gain a firm grasp on this internal structure and maintenance required for small data. Big data, machine learning, predictive analytics, process automation, and improved information accuracy are all pieces to a larger data management and analytics strategy, but the most important piece and the foundation of that strategy is internal data policies and practices. If the current analytical strategy feels disjointed or amiss, it’s imperative to look to consolidate and fix the current environment, rather than bolt on additional tools and hope the old ones will no longer matter.

Often, the inceptive step to implementing these powerful big data tools is to investigate the past and make sure the foundation is structurally sound. The current state of analytics should look healthy before looking to advance the agenda to big data. With companies utilizing a varying number of resource platforms and analytical platforms, it is essential to validate the data is smart. Inconsistent, incomplete, or duplicated data can perplex the most adept analytical cognoscente. Themes of data inconsistency can arise from outdated Application Programming Interfaces (APIs)transform, and load (ETL) processes or from disconnects between generations of analytical tools. When inefficiencies are found, issues in accuracy and timeliness can cripple an analytical department. At its most efficient, this organizational data serves a descriptive purpose – it can be used to determine what is currently happening across various areas of the company. Without being able to understand the internal picture, any additional analytical capabilities would be extremely hindered. Therefore, the effective review of current legacy systems is imperative. Effective enterprise data management should also include considerations around the organization’s structure, culture, and attitude towards adoption. The most current, advanced technology won’t gain any traction unless the organization and individuals are willing to adopt the technology and consider its insights.  Often times, this is where organizational change management, human resources, and managers should look to gage the atmosphere before launching into large data initiatives.

Once any organizational impediments have been addressed, companies often run into the next logical roadblock – trying to glean additional insight from this historical data. While it is accurate in forecasting things like lease expenses for property, plant, & equipment (PPE), debt coupon and principal payments, and licensing deals, it is inherently limited when any independent variable is introduced. This is typically where the current data analytic capabilities of organizations end. Organizations will often have a good internal picture but are unsure of the next steps to achieve greater insight and understanding of financials, potential for automation, and the steps required.

One of the ways that insight of an organization’s internal picture can be further analyzed, refined, and augmented is to deploy Robotic Process Automation (RPA) and other automation based on predictive analytics. With the pervasiveness of networked devices and the continued digitization of business operations there is a greater possibility than ever to track, refine, and automate parts of or entire business processes. Cisco calculated a total of 18.4 billion devices connected to IP networks in 2018, and projects that number to rise to 29.3 billion by 2023[i]. With that level of digital insight, automation is no longer dependent on personal insights or anecdotes to drive areas of process improvements. Machine Learning can do that and much more. Combining Machine Learning capabilities with RPA will allow the automation of processes and standardization of inputs to reduce cost, improve analytics, and minimize manual intervention. These process improvements can hit every corner of the financials scope. Automated invoice processing and payment, treasury transfers, vendor requisition, reconciliation processes, forecasting, and receivables collections are all pushing further into the realm of automation. This can lead to a multitude of benefits including the reduction of manual errors, greater processing speed, and elimination of repetitive tasks. These improvements can be a boost to employee satisfaction and the bottom line. As far back as 1995 the CPA journal was noting repetitive tasks as an issue for attrition in financial fields, and in many capacities, it’s been seen in areas of the industry long before that[ii]. The improvement of internal data management impacts much more than a company’s queries and reports. Continuing to improve internal data capabilities should be one of the preeminent goals of an effective organization. However, that is no longer the end-all-be-all for an organization’s analytical capabilities. Big data and predictive analytics have entered the fold of the most advanced organizations and are now matriculating to every corner of the market. First mover advantages are still possible, but it is more important than ever to act on these advances and gain the competitive advantages big data has to offer.

The general principle of Moore’s Law has often been boiled down to technology doubles and prices cut in half about every two years[iii]. With that exponential growth in mind, it is no surprise that companies are now able to store previously unheard-of quantities of data to perform previously impossible analytic activities. The process for doing so entails building out new strategies, new hardware, new software, and hiring new personnel[iv]. Traditional Business Intelligence (BI) tools cannot handle the largely unstructured or semi-structured data now used in predictive analytics. Big data has moved to such a large scale that creating a repository to hold internally generated and external data is simply not feasible and for traditional relationship database not even possible. This is where having a defined strategy to ask the right questions is essential. These strategies must match the new hardware requirements with personnel who know how to effectively manage it. With the right hardware and personnel in place, machine learning is a must for any robust predictive analytics strategy. Individually built formulas and manual models can no longer outcompete the speed and power of machine learning. While machine learning’s predictive analytics cannot detail the future, it can help shine some light on it. Asking the right questions is like shining a few powerful flashlights very closely together – you can’t illuminate all your surroundings, but the immediate vicinity becomes clearer. With poor strategy, you may have a more powerful bunch of flashlights, but the light is so spread out the benefit becomes negligible. These tools are requiring newer hardware with new capabilities, and while expenses have decreased, that doesn’t mean they are cheap, so considerations should be formulated and discussed in advance, before diving in headfirst. Furthermore, investment in new capabilities starts with the hardware and continues through the development and implementation of a strategy. Only then can you begin to answer key financial and operational questions. The questions being answered by these advanced analytical tools can be used to assess client credit risks, measure vendor financial stability, find the optimal structure to offer dynamic pricing and discounting terms, and even determine optimal investment strategies based on company directives. This can be further extended into business operations and product offers. Predictive analytics is even making headway in human capital management (HCM) and bringing data to hiring decisions, flight risk predictions, and sales incentive structures[v].

While some of the benefits to more advanced analytical strategies are apparent, there are definitive difficulties that must be considered. Being able to determine what forms and sources of big data are required is the only the first piece required, the second is the personnel needed to build and maintain these evolving predictive analytic capabilities. McKinsey & Company was the first organization to predict a data analytics talent shortage in their 2013 eBook, with the initial projection ranging between 140,000 – 190,000 resources[vi]. IBM, and the team that displayed the full power of machine leaning with Watson on Jeopardy! in 2011, revised that number and put their estimation around 700,000 openings by the end of 2020[vii]. Either number is a clear indicator of the depth of the current personnel shortage. This means that competitive advantages must be in place to lure these talent laden individuals to the organization. Infrastructure, capabilities, and a clear direction should all be in place when looking to hire the necessary data scientists to fully realize predictive analytic capabilities. Difficulties abound, the ROI is calculable – Bill Gates posited that a breakthrough in machine learning could be revolutionary, stating, “if you invent a breakthrough in artificial intelligence so machines can learn, that is worth 10 Microsofts.” Incremental improvements, not just breakthroughs, hold substantial value. The value can range in scope and industry, for example, Chase saved millions through a project that detected early loan repayment, Cerner predicts sepsis (a fairly lethal blood infection) based on markers in the health record, and several states have adopted sentencing guidance based on chance of recidivism (the chance an offender will commit another crime) and a multitude of other factors[viii].

Many of these projects see the elimination of heuristics alongside improvements in financial metrics. A rather stark example of a heuristic that can be eliminated emanates from Jonathan Levav, an associate professor at Columbia University, who studied judicial ruling and found that “you are anywhere between two to six times as likely to be released if you’re one of the first three prisoners considered versus the last three prisoners [after a lunch break].[ix]” No one would’ve thought that hunger was a leading indicator on inmate parole decisions prior to these advanced analytics. With the implementation of the project focused on recidivism rate calculations, judges can have a clear input that shows statistical chances a paroled inmate could reoffend. These judges no longer have to rely exclusively on their proverbial “gut.” Other biases lend themselves to poor business decisions and not just judicial rulings. Perhaps the most prevalent is that of the availability heuristic, where individuals rely on a mental shortcut that takes the most immediate example in their mind and applies it to the current topic or situation. Similar projects, projects that worked at a previous company, or topics of a book a manager recently read can all impact a decision to a massive extent[x]. Predictive Analytics can help reduce those negative biases by taking a large amount of general information and breaking it down to the most likely potential outcome for an individual situation. Through advanced methods like ensemble modeling and meta-learning, dozens of potential models can be aggregated and weighed to give a more in-depth prediction than previously possible. All of these efforts are moving to help managers make more informed decisions. While not a crystal ball, predictive analytics is a powerful tool when incorporated into decision making and can reduce bias, eliminate potential sources of asymmetric information, and drive more informed decisions.

Internal BI improvements, robotic process automation, and predictive analytics are all frontiers that interlock­­­­. When integrated correctly, the companies that are adopting these new technologies are reaping the rewards, leveraging better insight to make more informed decisions.  Developing strategies and roadmaps that efficiently interlock tools, hardware, and people are the key to realizing the greatest benefit possible.

We’re conducting a survey on the future of Data Analytics, Predictive Analytics, and Robotic Process Automation. Data gathered will help us to gain the necessary information to construct a White Paper that builds around the core, complex issues in these topics. To tailor the objectives, solutions, and to provide a useful environmental snapshot, we need your support through this questionnaire. Fill it out anonymously here.


About the Authors:


[i] Cisco. 2018. Cisco Annual Internet Report (2018-2023) White Paper. 

[ii] Roth, Patricia G; Roth, Philip L. Sept 1995. The CPA Journal; New York Vol. 65, Iss. 9.

[iii] Rotman, David. February 2020. We’re not prepared for the end of Moore’s Law. MIT Technology Review.

[iv] Simon, Phil. 2013. Too Big to Ignore: The Business Case for Big Data. John Wiley & Sons, Inc. Hoboken, New Jersey.

[v] Siegel, Eric. 2016. Predictive Analytics: The Power to Predict who will Click, Buy, Lie, or Die. John Wiley & Sons, Inc. Hoboken, New Jersey.

[vi] McKinsey & Company. March 2015. Marketing & Sales: Big Data, Analytics, and the Future of Marketing & Sales. McKinsey & Company. New York, NY.

[vii] IBM. 2017. The Quant Crunch: How the Demand for Data Science Skills is Disrupting the Job Market. Burning Glass Technologies. Boston, MA.

[viii] Siegel, Eric. 2016. Predictive Analytics: The Power to Predict who will Click, Buy, Lie, or Die. John Wiley & Sons, Inc. Hoboken, New Jersey.

[ix] Shai Danziger, Jonathan Levav, and Liora Avnaim-Pesso. April 2011. Extraneous Factors in Judicial Decisions. Proceedings of the National Academy of Sciences of the United States of America.

[x] Kahneman, Daniel. October 2011. Thinking Fast and Slow. Farrar, Straus and Giroux. New York, NY.