Asking a bot a data question may not be the magical solution we have all been waiting for.

While I do believe that LLMs powering applications will meaningfully change how we live and work, I am not convinced this is a breakthrough technology or UX paradigm for the data domain.

Even setting aside technology limitations of LLMs and their relative ability to model words vs numbers, I routinely come across organizations that

  1. are struggling to build foundational measurements from a hodge-podge of sources
  2. have defined metrics, but frequently experience confusion over the definitions
  3. are unable to control the chaos and bloat with metric definitions
  4. or, have key metrics but poor segmentation, and inadequate slice/dice/analyze capabilities
  5. operate on averages with limited granular and actionable insights
  6. build lots of reports but most are neither viewed nor maintained
  7. or, disseminate reports that consumers view but don’t find it actionable
  8. keep adding to the “insights” request queue of the data team

The canonical use case I’ve seen so far is using natural language to ask questions: “give me the sales in NYC in Jan” or “what are the top 5 channels for new customers in March?”.  Reducing friction to access factoids is valuable, but for many organizations, there are already reports galore capturing all types of measurements.

The constraint is in knowing what questions to ask, to translate business semantics into which datasets to use and how and for what use case, to do the analytically rigorous work, to slice and dice, to use all their prior knowledge and context to interpret the data correctly, and ultimately make a decision that moves the business forward.

In addition, enterprises contain complex connected business processes, and we are still  scratching the surface of how data can model and understand these connections. It is unclear if AI can handle the complexity of the data domain beyond the retrieval of basic facts.

I can be persuaded. If in response to a question, a bot can read data and meta-data, perform analysis, write a summary of observations and suggest next steps, I will be intrigued. But until then, I will be happy to see this technology make a huge difference in the day-to-day life of a data engineer or a scientist as they go about writing code and documentation.