Glossary of Terms

  • accuracyIn predictive analytics, accuracy is a measure of a predictive model’s performance. It’s usually expressed as a percentage, calculated by dividing the number of correct predictions by the total number of predictions made.
  • advanced analyticsAdvanced analytics is the use of sophisticated quantitative approaches to data that can not only reveal insights, but also provide predictions and offer forecasts. These approaches usually include AI and machine learning, as well as statistical techniques. It can also include prescriptive approaches that offer specific recommendations. Advanced analytics is widely used in many industries, including insurance, e-commerce, retail, banking, healthcare, manufacturing and more.
  • algorithmIn the data context, an algorithm is a set of instructions for a computer to bring in input data, manipulate it, perform calculations with it, and generate output. Algorithms used in data science offer preset methods for analyzing data, identifying patterns, and generating predictions. More than one algorithm can often be used to address a predictive analytics question, and choosing the right algorithm is an important part of the process.
  • analyticsAnalytics is a business practice that uses descriptive and visualization techniques to gain insight into data; those insights can then be used to guide business decision making. “Data analytics” as a term does not necessarily include predictive approaches. Instead, data analytics typically focuses on gaining better understanding by using data from the past.
  • annual contract value (ACV)Annual contract value (ACV) is a standard revenue metric for SaaS companies and other subscription-based businesses. It refers to the annualized contract value per customer contract. ACV breaks down the total contract value into an average value per year over the length of the contract.
  • annual recurring revenue (ARR)Annual recurring revenue (ARR) is a financial metric showing the total revenue generated from yearly subscription sales. Companies like SaaS makers and subscription businesses with contract terms exceeding a year prefer the ARR metric over the monthly recurring revenue (MRR) metric. 
  • App Event Optimization (AEO)App Event Optimization (AEO) is a type of campaign optimization on Facebook (Meta). The platform looks for users who are mostly likely to complete specific in-app events for the lowest cost. Typically, the event is an in-app purchase. The platform will find look-alike audiences likely to perform the same action.
  • application programming interface (API)An application programming interface, usually called an API, is a procedure that allows two applications or systems to communicate and share information. In the data world, you might use an API to retrieve data from a cloud server or to load a model’s predictions into other business systems.
  • artificial intelligence (AI)Artificial intelligence (AI) refers to the development of computerized systems that can carry out tasks and perform actions that augment or take the place of human intelligence. Data science and machine learning represent just part of the study and development of AI.
  • attribute vs. featureAttributes are the various data points or variables within a dataset. Features may be the same as attributes, or they may represent combinations of attributes or calculations done with attributes to generate new data points (i.e., through feature engineering).
  • automated machine learning (AutoML)Automated machine learning (AutoML) turns the process of building, training, and testing machine learning models into an automated routine that can evaluate hundreds or thousands of potential models, much more quickly than a human could.
  • average revenue per user (ARPU)Average revenue per user (ARPU) is a metric that measures the average revenue per active user over a given period. Many businesses offering subscription plans have adopted this metric to assess each user’s value to the company during a specific time period.
  • business analyticsBusiness analytics is the use of data about a business’s past activities, performance, or transactions to drive analyses that yield practical, useful insights for the business and its decisions.
  • business intelligence (BI)Business intelligence (BI) includes gathering, storing, and analyzing business data, as well as using that analysis to inform the actions of the business.
  • churn detectionChurn detection is the process of identifying customers at risk of churning. A predictive model can look at data about past customers who churned and look for patterns in their behavior that preceded their churning. The model can then look at current customers and try to find similar patterns so you can take action to retain these customers. This approach can reduce customer attrition and boost customer retention.
  • churn predictionChurn prediction involves building a predictive model based on past customer data. That model will help identify patterns in customer behavior that correlate with churn, allowing for the identification of those patterns in current customer data and an intervention to prevent churn. Ideally, you can boost customer retention with this kind of model, and reduce attrition.
  • classification modelClassification models predict a class or category for each row of data (e.g., for each customer). They analyze data that includes the known category for data from the past, and then can predict which category will best fit future data. For example, a classification model could predict that customers would be most likely to purchase product A, B, or C, given their past transaction history. Churn prediction may also be done with classification models.
  • conversion analyticsConversion analytics is the analysis of data related to conversions to find useful insights. A conversion in the business context refers to a desired customer action, such as making a purchase or providing contact information. Classification models may be used for this purpose. Predictive analytics can help improve conversion rates by providing deeper insight into what affects customers’ decisions and predicting which customers will respond to which offers.
  • cross-sell modelCross-sell models are developed based on customer data and can identify which complementary products might most interest a specific customer. The goal of these models is to offer the right customer the right offer at the right time. Predictive analytics can use cross-sell models to improve conversion rates and generate greater revenue.
  • customer acquisition cost (CAC) paybackCustomer acquisition cost (CAC) is the total amount spent on acquiring a new customer. It includes all the sales and marketing efforts required to gain that customer. CAC payback is a measure of the average cost of acquiring one customer, and it’s calculated by dividing the total expenses by the total number of customers acquired during a specific time period. It’s critical for companies to monitor and manage CAC payback effectively to ensure their customer acquisition is efficient and consistent.
  • customer data analyticsCustomer data analytics can include both descriptive and predictive approaches to analyzing customers’ interactions with a business. Customer data analytics might address issues like churn detection and prediction, cross-sell and upsell, or customer lifetime value. The insights and predictions can be used to guide business decision making and improve business outcomes.
  • customer data platform (CDP)A customer data platform (CDP) is a single system that holds and organizes all customer data from various sources. The CDP constructs customer profiles and makes that information available to other business technology systems, such as those used for marketing, sales, and customer service.
  • customer lifetime value (CLV or CLTV) predictionCustomer lifetime value (CLV) is the total amount of revenue a business can expect to take in from a specific customer over the entire time period that the customer is actively engaged with the company. Predicting CLV (sometimes called CLTV or pLTV) can highlight customers who might receive special offers and inform strategies to retain the most valuable customers.
  • customer predicted lifetime value (pLTV) predictionCustomer predicted lifetime value (pLTV) is the total amount of revenue a business can predict will be received from a specific customer over the entire time period that the customer is actively engaged with the company. Predicting pLTV (sometimes called CLV, LTV, or CLTV) can highlight customers who should receive special offers and inform strategies to retain the most valuable customers.
  • data blendingData is often stored in a variety of locations, from cloud data bases to on-prem databases to Excel files. A frequent challenge of data projects is combining or “blending” all of those data sources. Many data platforms offer the ability to readily connect to different data sources and import various file types. also known as data integration, data munging
  • data clean roomsData clean rooms are isolated, secure locations used to store and combine aggregated, anonymized data from multiple sources. They provide additional privacy protection for individual-level data while also allowing marketers to match their first-party data to aggregated data from other sources.
  • data cleaningData typically needs some “cleaning” prior to being used in machine learning models. For example, an unusually large number may represent a data entry error, or it could be an outlier that’s unusual but correct. Clean data is essential to high-quality machine learning models. also known as data cleansing, data preprocessing
  • data encodingData is not always in exactly the right format for predictive modeling. Some data, such as text, may need to be represented differently to be used in a mathematical model. The data encoding process ensures that all data are ready for use in a model.
  • data engineeringData engineering includes setting up and maintaining systems for gathering and storing data, as well as constructing processes for retrieving data for use in predictive analytics and modeling. Data engineering has become a specialized job of its own at many data-driven companies.
  • data enrichmentData enrichment involves integrating external data from trusted third-party sources into analytics in ways that complement a company’s internal data. For example, demographic, weather, or public health data can enhance the performance of predictive models. The retrieval and integration of this data can be time-consuming and technically challenging if not handled through automation.
  • data leakageData leakage occurs when a machine learning model is trained with information about the target/outcome variable that it will not have when used in production. This typically occurs when a feature is included in the training dataset inappropriately. For example, if you want to predict whether a website visitor will purchase a product using their behavioral and demographic details, but accidentally include a feature reflecting their purchases in the training dataset, the model will “know” information about the visitor’s future that will not be available when you use the model to make predictions about a new visitor. Additionally, the model will seem to perform unusually well because it has been provided information that directly correlates strongly with the target/outcome variable.
  • data preparationData preparation is a blanket term that can include everything from combining data from different sources, dealing with outliers and missing data, making statistical adjustments, and encoding data in the correct formats for predictive modeling. Although this process can be tedious and take a lot of time if conducted by hand, automated processes can build it into a predictive modeling workflow efficiently. also known as data preprocessing
  • data scienceData science combines statistics, computer science, scientific methods, and business knowledge to analyze, model, and predict using data. The data science toolkit can be used to analyze all kinds of data, from numerical to text to images. Ideally, the insights and predictions gained from data science are used to enhance business success.
  • data visualizationData visualization is the communication of data trends and stories in a visual format, such as in a bar chart, line graph, timeline, word cloud, or even a map. “Data viz” should make it easy for the viewer to quickly identify important trends or patterns in data.
  • data wranglingData wrangling is a term sometimes used to encompass data blending and data cleansing, suggesting all the forms of manipulation that data might need to be ready for use in a predictive model. also known as data preprocessing, data preparation
  • deep learningDeep learning is a specific area of data science that uses analytic methods based on human brain structure to analyze data and generate predictions. Specifically, this area focuses on algorithms called neural networks that have many layers, which is where the “deep” term comes from. Though especially widely used in areas like image, video, and text analysis, deep learning can be used for many predictive purposes.
  • demand forecastingDemand forecasting involves trying to determine the likely future need for an item, based on historical data and analytics showing how much of it has been needed in the past. For example, a grocery store needs to know roughly how many loaves of bread to order each week, and can forecast how many will be needed based on prior demand. AI-based demand forecasting can save significant resources through accurate determination of needs. This kind of predictive demand forecasting has been adopted by varied industries including manufacturing, retail, grocery, CPG, and more.
  • demand planningDemand planning using AI and machine learning is the process of generating forecasts for demand and planning to satisfy that demand most efficiently using available resources. Supply chain demand planning increasingly uses predictive modeling to ensure products and services are allocated and promoted effectively to reduce costs, cut down on environmental impact, and provide greater profit. Data enrichment can bring data about external conditions, like weather or labor availability, into the predictive modeling process for greater accuracy.
  • descriptive analyticsDescriptive analytics is the analysis of data from the past in order to “describe” what has been happening in a business. It typically includes looking for trends or patterns in order to find meaningful insights. Calculating summary statistics, like a mean or median, and creating data visualizations, like scatter plots and bar charts, are frequently used in this kind of analysis. It typically does not include predictive modeling.
  • exploratory data analysis (EDA)Exploratory data analysis (EDA) is an initial stage in the predictive modeling process. In this stage, the analyst looks at statistics representing the distributions of each of the variables, looking for interesting patterns, relationships among variables, outliers, and potential data entry errors, as well as checking basic assumptions about the data. Graphs and other data visualizations are often part of this process.
  • feature engineeringFeature engineering is the process of manipulating and transforming raw data into forms that are more valuable in a predictive model. Datasets offer many options for creating new features, and deciding which ones to create and retain is considered a craft by data scientists. For example, you might have repeated transactions for each customer in your dataset that are actually more informative to predictive models if used to calculate an average transaction amount for each customer. In addition to creating the new features, new labels must be added to make the engineered data understandable to users and meaningful when the model is assessed.
  • feature selectionWhile it’s great to have a lot of data, not every variable (aka feature) in your dataset will be equally informative in a predictive model. Typically you want to build models using the most valuable features, and omit those that offer less information for the predictions or that are redundant. Feature selection is the process of determining the value of each variable to the model and deciding which variables to keep in the model.
  • first-party dataFirst-party data is the data that a company collects itself instead of acquiring it from other sources. For example, data on visits to the company’s own website, from newsletter subscribers, and from webinar attendees all can contribute to a robust first-party data repository.
  • imputationMissing data can sometimes pose a problem for predictive modeling. A process called imputation will replace those missing data points with “best guesses.” Depending on the reasons for the missing data, different methods can be selected for imputation. This process can also be automated to make data preparation easier.
  • lead scoringLead scoring is a method of predicting the chance a new lead (prospective customer) will become an actual customer. Each lead is assigned a score that reflects how likely they are to become a customer. Scores are assigned based on information about the lead, such as information they provide, behavioral data, firmographic data about their company, and other relevant data. These scores can be used to automate sales and marketing efforts and to prioritize high-scoring leads for faster or more tailored action. Lead scoring can lower customer acquisition costs, boost conversion rates, accelerate sales cycles, and improve alignment between sales and marketing strategies.
  • lifetime value (LTV) predictionLifetime value (LTV) is the total amount of revenue a business can expect to take in from a specific customer over the entire time period that the customer is actively engaged with the company. Predicting LTV (sometimes called CLV, CLTV, or pLTV for predictive LTV) can highlight customers who might receive special offers and inform strategies to retain the most valuable customers.
  • look-alike modelingLook-alike modeling is an approach that seeks to identify the behaviors, demographics, and other shared traits of your ideal customers. Using those ideal customers as a “seed set,” a mathematical model can find other prospects or new customers with similar characteristics. You can then target these look-alike customers with outreach and messaging to help them become equally high-value in the long term.
  • machine learning (supervised and unsupervised)Machine learning is an area of data science that helps computers learn in ways similar to human learning. Supervised machine learning methods use data from the past to find patterns that can inform a mathematical model. The model is refined until it does a good job of matching what happened in the past data. Then, the model can make predictions about the future using new data from the present time. Unsupervised machine learning looks for patterns in data where there isn’t a clear pre-existing structure. For example, clustering is a form of unsupervised machine learning that tries to identify groups of similar items or people.
  • machine learning operations (MLOps)Machine learning operations, or MLOps, includes all the work that surrounds the machine learning model development process. It encompasses making sure data are available, providing access to data for analysts, integrating models into business workflows, and monitoring and updating models to ensure they are performing well. In large organizations, MLOps may require multiple people and/or teams. However, these processes can also often be largely automated as part of a predictive analytics platform.
  • marketing performance management (MPM)Marketing performance management includes a variety of services and technological tools that improve marketing teams’ capabilities in using data, obtaining actionable insights, generating predictions, and generally improve marketing campaigns. This approach optimizes marketing efficiency and makes the best use of resources allocated to marketing.
  • Mobile App Installs (MAI) optimizationMobile App Installs (MAI) optimization is a type of campaign optimization on Facebook (Meta). The platform finds users who are likely to install your app. These users will ideally also have low CPM and fit your audience targeting parameters. These users are less likely to buy or engage with in-app purchases, but can help monetize your application or mobile game via in-app advertising.
  • modelIn the context of machine learning, a model is a specific instance or example of an algorithm that has been created based on a particular dataset and that can be used on new data to generate predictions or find patterns.
  • model driftPredictive models can perform well at first, but it’s common that their performance can decrease somewhat over time. For example, once a predictive model is implemented, the related business changes may alter the outcomes that occur, and so the model may need to be adjusted to fit the new reality. The relationships among the variables have changed, and the model has to be updated as well. To ensure the best ROI from predictive models, their performance should be monitored to catch model drift and adjust as required. Automated monitoring tools can help address this concern.
  • model training timeThe time it takes to train a machine learning model varies. Some important factors in the time required include the quantity and complexity of the data, the specific algorithm being used for the model, and the computing power available for training. Simple models built on small datasets can be trained quickly on a typical laptop, while large datasets typical of many businesses will require more time and more computing capacity.
  • model training, validation, and testingTraining, validation, and testing are parts of the machine learning model building process. Using historical data, the model is trained and “learns” to identify patterns and trends. The model is then validated through comparing its outputs to known outcomes for a second dataset. The model is evaluated on its ability to specify correctly or get close to the true outcome or target variable in that second dataset. This stage of evaluating the model allows its builder to see how well it is performing and to compare it to different versions of the model. Those different versions may use other predictive approaches or be set up in different ways that offer better or worse perfomance. Finally, the model is tested on a third set of data it has never seen before, allowing its builder to judge its likely performance when it is deployed.
  • monthly recurring revenue (MRR)Monthly recurring revenue (MRR) is the total revenue generated monthly by all of a business’s active subscriptions. It is a monthly amount that includes all recurring revenue throughout a given month. MRR does not incorporate all the revenue a company can generate monthly, as there may be revenue from non-recurring sources. Instead, it is the total of the regular monthly payments the company receives. In addition, monthly recurring revenue will include revenue generated from recurring add-ons but not include one-time customizations. 
  • net revenue retention (NRR)Net revenue retention (NRR) is the revenue retained during a period from customers, taking churned customers into account. Along with MRR and ARR, it’s a valuable metric for recurring revenue. In addition, NRR is a reliable metric for analyzing customer interaction and gauging the “stickiness” of a product or service.
  • neural networkNeural networks are used in predictive modeling. Their construction is based loosely on human biology. They are constructed of a series of algorithms that each carries out its own specific operation on the data, then passes its results to the next layer, until an output layer is reached and a final output or prediction is made. Neural networks can be used on many kinds of data, but complex networks with many layers are most used in deep learning for challenging data like images, video, and text.
  • optimizationBroadly speaking, optimization is a process used to either maximize or minimize an output value by selecting the right input values. In data science, this process involves creating a mathematical model that can identify the right input values to reach a desired outcome. Examples of optimization might include marketing campaign optimization (i.e., allocating money to the right channels for the best results) and supply chain logistics (e.g., optimizing transportation options to maximize speed and sustainability). Machine learning models can be used for this kind of optimization.
  • overfittingOverfitting occurs when a machine learning model learns its training data too well, and then tries to apply a pattern too tightly defined by that training data to new data it encounters. The model will seem very accurate when evaluated on the training data, but it will not generalize well to new data and will perform poorly.
  • precisionIn predictive analytics, precision shows what proportion of a machine learning model’s identifications of an item were actually correct. As an example, imagine a machine learning model that is trained to recognize cats or dogs in photos. Its precision is based on how many times a photo actually contains a dog compared to how many times the model says it contains a dog. also known as positive predictive value.
  • Prediction

    A prediction is the ultimate goal of a predictive model. In Pecan, a prediction is often tied to a specific customer. After learning from data and applying mathematical analysis, a model generates a specific numeric value representing the outcome of interest for the model builder. For example, the prediction could numerically represent the likelihood that a customer will churn or the customer’s likely lifetime value.

  • predictive analyticsPredictive analytics uses data, statistics, and machine learning techniques to build mathematical models that can generate predictions about things likely to happen in the future. Predictive models are trained to identify patterns and trends that are likely to recur. Predictive analytics is used in a wide variety of industries, including banking, insurance, retail, CPG, manufacturing, e-commerce, food and beverage, and more.
  • predictive marketingPredictive marketing is the integration of predictive analytics and machine learning into marketing practices. Specifically, data science techniques can be used to understand and predict customer behavior, allowing marketers to proactively respond to customers’ interests and behavior. Among many uses, predictive marketing can increase the success of upsell and cross-sell offers, improve conversion rates from email campaigns, and provide insights for campaign optimization.
  • prescriptive analyticsPrescriptive analytics is related to descriptive and predictive analytics, and can be considered a final step in a data-driven decision making process. Specifically, prescriptive analytics guides the best course of action to be taken in a business situation, as informed by the data that has been analyzed and used in predictive modeling. Prescriptive analytics is used in many industries, including insurance, manufacturing, and human resources (i.e., people analytics).
  • recallIn predictive analytics, recall shows what proportion of the actually relevant cases were correctly identified by a machine learning model. As an example, consider a machine learning model that is trained to recognize cats or dogs in photos. Calculating its recall would be based on how often it actually recognized the dogs out of all the photos that truly contained dogs. also known as sensitivity
  • recency-frequency-monetary (RFM)RFM analysis is a tool used in marketing to score customers based on three categories: the recency, frequency, and monetary value of their purchases. This approach lets companies see which customers have the highest likelihood of becoming ongoing purchasers, how revenue is distributed among new and established customers, and who are the highest-value customers.
  • regressionRegression models are used in statistics and machine learning to represent the relationship among variables. These models can show the strength of their relationships and also offer insight into how the variables affect each other. In predictive analytics, regression models can be used to predict a value for an outcome variable, given the values for other variables that are related to it. For example, based on records of customer behavior, we can predict customers’ lifetime value (also known as CLV, LTV, CLTV, or pLTV).
  • return on ad spend (ROAS)Return on ad spend (ROAS) is a metric used to assess the performance of marketing efforts. It is equal to the amount of revenue generated for each dollar expended on a campaign. Similar to return on investment (ROI), it demonstrates how much profit was produced by ad spending. ROAS can be calculated at multiple levels: individual ads, campaigns, or an entire marketing strategy. It is a crucial key performance indicator (KPI) when looking at the success of advertising.
  • SHAP valuesSHapley Additive exPlanations, aka SHAP values, quantify the effect of each feature in your model on the predictions generated by that model. The values are calculated by comparing the output of your model with each feature to its output without each feature. SHAP values can be calculated “globally” for all the model’s features, taken as a whole; they also can be calculated “locally” for each individual prediction to show which features most affected that prediction.
  • share of wallet (SOW)Share of wallet (SOW) is the regular amount of money a consumer spends with a specific brand, compared to what they spend with competitors within the same category. Customers likely have a set budget they will spend on a type of product or a product category or service. Therefore, share of wallet (SOW) is the percentage of that customer’s budget spent with one brand instead of competing brands. Share of wallet is a metric typically used by business-to-consumer (B2C) or direct-to-consumer (D2C) companies.
  • supply chain analyticsSupply chain analytics refers to using data about supply chain processes to garner insights into the processes and — in predictive analytics — to generate predictions that can be used to improve and streamline those processes. Predictive approaches can be used to try out different scenarios and select the best options, as well as to plan for potential situations that could arise. Predictive analytics applied to supply chain challenges can identify risks, find trends, and minimize costs. Supply chain decision making that is informed by predictive analytics can also increase resiliency in unpredictable times.
  • target variable (or outcome variable)The target variable in a machine learning model is the variable you want to predict. For example, you might want to predict whether a customer is likely to churn or not. also known as outcome variable, dependent variable, target
  • underfittingUnderfitting occurs when a machine learning model has not learned well from its training data and hasn’t recognized a pattern that it can apply accurately to new data. The model may be too simple to capture meaningful patterns in the training data. Underfit models will perform poorly on training and test data.
  • upsell modelUpsell models are developed based on customer data and can identify which customers might be likely to buy a higher level or additional product or service. The goal of these models is to offer the right customer the right upsell offer at the right time. Predictive analytics can use upsell models to improve conversion rates and generate greater revenue.
  • Value Optimization (VO)Value Optimization (VO) is a type of campaign optimization on Facebook (Meta). It focuses on finding the highest-value users for your application or mobile game. This form of optimization can provide better return on ad spend (ROAS) than another popular optimization method, App Event Optimization (AEO), but it may also result in a higher cost per install.