Skip to content

what is big data and what is people data?

Published by Rupert Morrison 

defining big data and people data

What’s the difference between ‘big data’ and ‘people data’? Big data is defined as data of high volume, high variety and high velocity. People data can be defined as datasets describing people in organizations. The two types of data will overlap in some cases and I’m sure there will be other interpretations of the differences. I’ve found it useful to try to specify what’s different between two important sources of data for analytics.

DimensionBig DataPeople Data
StructureBig datasets are better structuredPeople data frequently changes its structure, with new dimensions being added
StabilityBig datasets are more stablePeople data is less stable, with the frequent creation of new datasets
SizeBig datasets are larger, extending to billions of rowsPeople datasets are smaller, usually ranging from 100 to 100,000 rows
SystemsBig datasets are stored on systems specifically configured for the dataset by database professionalsPeople datasets are created and stored on a mix of systems, including many in Excel
CleanlinessBig datasets are relatively clean – with partial records cleansed or omittedPeople datasets often have high percentages of gaps and errors
InteractionBig datasets tend to be used for reportingPeople datasets are often edited and added to with changes or comments which become part of the dataset (e.g. manager scoring of performance and notes on employee performance)
Source of ValueBig data can be valuable due to sheer volume, e.g. allowing regression analysis of the optimal care pathway for small subgroups amongst millions of healthcare patients. It can be valuable via linked analytics – for example, when Amazon nudges you with the suggestion that ‘People who bought this also bought…’ .The value of people data is in overcoming the incompleteness of data, visualizing it in meaningful ways, supporting interaction, modelling and allowing linked analytics: ‘Which site manager training course was linked to the greatest increase in site profitability?’

distinguishing the output of organizational design from the process of organizational design

Organizational design can be understood as an output (the thing you produce) or as an activity (something you do). I’m arguing that of course big data might change the output, in the sense of needing different numbers of people in different places. I don’t see that big data change the process. People data are different.

For the organizational design process, it’s not the large volume of data that makes a difference but the more unusual properties of people data. For example, of being imperfect, needing visualization, being reflexive and having linked relationships. These ideas are captured in the table below:

Where ‘People Data’ might affect the Organization Design process:

What makes it people data?ExampleChange to the structure?Change organizational design process?Comment
Imperfect data – incomplete or of doubtful veracityPeople data might be incomplete, but might still need to be used for org designCould allow simpler and more flexible structuresIncomplete data could be used for incomplete design – x% adaptableOrganizations have always been re-designed on the back of napkins! However, it would be innovative if organizations were designed consciously to cope with incomplete data.
Visualization of dataPeople data particularly relevant for expressing via color, size, shapes and hierarchical structuresNo direct impact on structureAdjustment more likely if managers see where costs, skills, customer impacts areVisualization could affect organization design – by giving a sense of the organization more intuitively, it might be more possible to achieve an organization design that makes sense to more people.
Reflexivity of dataPeople datasets can affect themselves – as expressed in the feedback loops in Silverman Research’s Social Media GardenDoesn’t change structure directlyYes, the org design process can evaluate its own progress and adaptAn exciting extension of the idea of group training environments where the group is explicitly invited to reflect on its own process, take ownership of it, and improve how it operates.
Linked relationshipsPeople data is unusually highly linked – to processes, costs, customers, skills, engagement etc.New role for strategy team / MI team?Yes, monitor systemic impact during the change processIt has always been hard to link and process the data. As this becomes possible on an ongoing basis, people will be more able to reconfigure their organizations as needed.

imperfect or incomplete data in organizational design

Incomplete data might be seen as a problem for organizational design. After all, an organizational design is meant to treat the whole organization as a system – linking people to roles to processes, to competencies to client deliverables and objectives. For example, if we do not know how people currently use their time, how can we understand the ‘as-is’? How can we instruct people to use their time in the ‘to-be’?

Linked datasets are valuable in addressing incomplete data because they expose gaps. We understand 100% of our costs. But in a changing world, can we link them to processes? Can we link them to clients? We have listed our organization’s risks but do we know who is responsible for each one? As people’s roles change and outside factors alter risk levels, can we track who is overloaded? It is easier to sense-check this kind of analysis through linked datasets than it is through simple ‘one aspect’ datasets.

Incompleteness may be deliberate. Google’s 20% time is based on the idea that the most valuable innovations may come from unexpected areas. Google has empowered its employees to spend part of their time on whatever seemed to them to be the most valuable use. It is arguable that this ‘unstructured’ time is actually structured and managed highly effectively. The work done, the choices made and the results achieved are reviewed by peers in briefing sessions. Google’s 20% time is an example of how organizations can be designed flexibly to include information gaps, to convert unstructured innovation to structured value.

For both kinds of organizational design – a fully ‘defined’ organization or a deliberately incomplete definition – linking is powerful. Linking one dataset to another helps the organization to be conscious of where it has specified its activities, skills, deliverables, risks and where it still has gaps.

visualization of data in organizational design

Our recent work with orgvue has shown us that visualization affects how organizations can design and manage themselves. The example below shows a business which had historically visualized its cost in tables of numbers or bar charts per division. We mapped people to processes allowing the business to see the cost per process (in this case, a simple example for the cost of HR processes). The impact for the business was to enable the organization to see itself as a system, to start making choices around processes that need to improve and work it wanted to do in new ways:

orgvue - Visualization of data in organizational design

reflexivity of data in organizational design

Silverman Research’s ‘Social Media Garden’ allows a large group to consider ideas reflexively – not only giving their suggestions, but considering each other’s suggestions, and responding, to develop ideas over several loops and allowing the socially constructed mass of ideas to evolve:

Reflexivity of data in organization design

This methodology for gathering group ideas genuinely differs from surveying due to its looped nature and differs from a ‘town hall’ meeting because of its greater potential scale and anonymity. It could be applied as a new method for organizational design, to surface issues and evaluate options.

linked relationships in organizational design

During the organizational design process, we’ve found it critical to be able to link aspects of the organization system to one another, so that impacts can be understood properly. In this example of an IT infrastructure supply company, we mapped (1) customers into customer segments, (2) processes to customer segments and (3) people to the processes that they carried out for customers. This allowed us to understand the connections between the customer segments and the true underlying cost, either at a customer level or a process level. This was vital for understanding true cost to serve per customer. It also helped drive real thinking about opportunities for process redesign.

Sample of process cost linked to customer segments:

Written by Rupert Morrison

CEO, orgvue

Rupert is an economist, recognized author, and leader in organization design, human capital management, and analytics. Formerly a strategy management consultant, Rupert is now CEO of orgvue and Concentra Analytics. His book "Data-Driven Organization Design" was shortlisted for the "2017 Management Book of the Year Award". He has dedicated his entire professional career to helping businesses revolutionize the way they see, plan, and manage their organizations through the innovative use of data and analytics.