How to collect and use the right data about your news audience
To improve our websites, boost reader engagement and grow subscriptions, we don’t want “more data.” We want deep insights leading to a better understanding of visitors’ needs and how they interact with our journalism.
The actual data (web metrics, advertising impressions, and newsletter subscriptions to name a few) are just the raw materials. It is not just collecting that raw data, but using it to inform your decisions, that is a foundational requirement of a subscription program.
Depending on the size of your organization, that may not require a fully built-out data warehouse, a staff of data scientists and a million-dollar marketing suite. But it does require an attention to processes, skills and technologies that may be unfamiliar to many traditional news organizations.
It is not just collecting raw data, but using it to inform your decisions, that is a foundational requirement of a subscription program.”
Regardless of the size of your organization, it is likely that the systems and skills needed to run a successful subscription program are currently spread across multiple departments and executive stakeholders.
For instance, the responsibility for advertising revenues and audience revenues commonly reside in different departments — a legacy of newspapers’ print-centric organizational structures.
In the digital world those two businesses are tightly intertwined, and success requires shared goals and close collaboration. The first step is to understand the state of your current data sources and processes.
Perform an enterprise-wide data audit
Create a spreadsheet that lists any system in your organization that creates or holds business data. Note your primary data sources and what each contains, what other systems depend on that data, which staff members access and analyze it, and how it is used. Also highlight where personally identifiable information is involved.
The goal is to capture every operational data source, representing every major internal department or business strategy, and inventory both broad and specific types of information you have access to. If there are any data warehouses already operating within the company, that also should be flagged immediately.
For a print media organization some typical categories of data might include:
- Advertising performance
- Digital subscription records
- Digital content analytics
- Financial reports
- Marketing campaigns
- Newsletter subscribers
- Print circulation records
- Registered users (non-paying)
Within each of those categories might be dozens or hundreds of individual data points: subscriber counts and email addresses for newsletters, the number of daily impressions and clicks for advertising campaigns, or a trend of monthly unique web visitors. Write down as much detail as is practical, and understand you will revisit this audit frequently as your plans develop.
Conduct a staff and skills analysis
Create an inventory of the people, roles and skills currently in your organization that handle data. Focus on the data sources identified in the audit above and document who accesses the raw data; who turns it into spreadsheets, reports or dashboards; who reviews the data to make daily or weekly estimates of business performance; and who analyzes and formats it with recommendations for executive decision making.
Here’s a checklist of questions to address in your skills analysis:
- How many people currently work with those data sets — either gathering and analyzing or utilizing it for decisions and actions? (Typically your reader-focused data is shared across multiple departments. Improving the collaboration between these teams is often a first step.)
- What specific data does each person/group use, and do they access the original source of the data or are they only seeing summaries? (As data is shared around the company some of the original context may be lost, leading to a lack of trust in the numbers.)
- What specific reports and analysis does each person use or perform? Who is the intended audience for each report?
- How often is the data processed and provided to stakeholders for review? (Increasing the speed of decision-making requires data to be analyzed and shared with executives at more frequent intervals.)
- What tools are used to access, process and present data?
- What raw data or pre-set reports do decision-makers access directly?
- For each report or dashboard, what decisions are made based on the data? Who owns those decisions?
- How much time is spent monthly on accessing, processing and analyzing data?
- What recommendations for processes and tools improvement are suggested by the staff that work most closely with your data?
- What specific skills are held by your current “data staff” (such as SQL, Excel, statistics, and database administration)?
Review those findings as your planning continues and begin to identify opportunities in both skills and processes. These gaps will identify the areas of focus as you hire new positions and reorganize current roles to improve your data gathering.
Prioritize data needs
What business questions are you currently unable to answer? These could include average revenue per user; the cost of acquiring new subscribers; which digital subscribers also receive an email newsletter; how often the average print user visits the website; and the average number of visits before a reader becomes a subscriber.
How might connecting some of these data sets enable you to make decisions and take action? A broad roadmap for data integration might be:
- Bring in two closely related data sources (for example, print and digital subscriptions) and utilize findings for “offline” targeted marketing (email campaigns, direct mail) and provide early insights.
- Integrate subscriber data, email newsletters and aggregate analytics data to begin simple on-site targeting of messaging, offers and content; build business intelligence dashboards.
- Integrate audience, advertising, analytics and editorial data to build comprehensive reader profiles and support automated marketing and engagement tactics and machine-learning driven insights.
Data and analytics terms to know:
AI (Artificial Intelligence): Technically describes intelligent machines that are “self aware” enough to adapt to their environment in order to achieve defined goals. Often misapplied to systems that use algorithms to examine large data sets and make discrete choices based on analysis — which is better described as “machine learning.”
Algorithm: A set of rules (from simple to complex) followed by a computer to solve a specific problem.
Best of Breed: The acknowledged leader in a specific technology or category. Typically referenced when building a complex system and buying and assembling individual parts instead of sourcing from a single vendor.
BI (Business Intelligence): The technology systems and process of using data to provide insights into business operations to inform decision making.
Database of Record: The canonical source of data for a particular business system. Also see “Operational Database” below.
Data Governance: Structure and policy dictating what data is utilized in the system and how it is transferred and processed to maintain integrity and business value.
Data Lake: Similar to a data warehouse but the data is stored in its native format. This is often a first step in the data collection process, allowing the data to be gathered in one location before it is normalized.
Data Mart: A portion of a data warehouse that will only contain information from a single department.
Data Repository: A generic term to describe any method of storing enterprise data.
Data Warehouse: A collection of company data organized to support business goals and decisions. The format data is adjusted and normalized.
ETL (Extract,Translate, Load): The process of gathering data from disparate systems, normalizing it to align with your other data sources and then uploading it to your central data storage system.
First-Party Data: Information on visitors and customers directly collected and stored by the publisher. (An advertiser could also have first-party data they use to target visitors on your site.)
ML (Machine Learning): In this context, it is the use of algorithms to create predictive models of user behavior. For example, a large volume of data is analyzed and a “propensity to subscribe” value is assigned to individual visitors based on their similarity to people who previously subscribed.
SQL (Structured Query Language): Used to search for specific sets of records in a database.
Third-Party Data: Information on visitors with whom you do not have a prior relationship. This data is often licensed or purchased to support targeting of advertising.
PII (Personally Identifiable Information): Data in your system that is not anonymous and can be connected to an individual visitor. For example: name, email, credit card number, mailing address. PII must be carefully managed within the system.
UID (Unique Identifier): The creation of a serial number used to recognize the same visitor across different platforms. Used to aggregate reader activity into a single record for analysis.
Visualization Tool: Software that allows the analysis of data and the creation of charts, graphs and reports to aid in the understanding of business performance.