BI / Text-to-SQL · For the Slalom team

Every Slalom data-practice client has lived some version of the same story. The CEO wants natural-language analytics. The data team tried, hit 60% accuracy, and shut it down. The CEO is asking again next quarter. This is the highest-value, lowest-risk wedge into a Slalom AI engagement, because the success criteria is mathematical and the business owner is already begging for it. GoodMem is the platform that closes the gap.

The pain, in the client's words

"My CEO wants ChatGPT for our data. We tried. The thing gets the right answer maybe six times out of ten. The other four times it makes up a column name or quietly returns wrong numbers. So we shut it down and went back to Tableau. The CEO is asking me again next quarter."

Why the standard playbook stalls

The default approach is to drop a model in front of a database, give it the schema in the prompt, and pray. It doesn't work. Text-to-SQL accuracy is not a model problem. It's a retrieval and context problem. The model doesn't know which 20 tables out of 800 are relevant to the question. It doesn't know that "revenue" in finance means something different than "revenue" in product. It doesn't know which historical query pattern looks most like the new one. So it guesses, plausibly, and ships wrong numbers.

Adding more model doesn't fix this. A bigger context window doesn't fix this. The fix is tuning the layer that decides what the model gets to see.

The model is not the problem. The system around the model is. GoodMem is that system.

What GoodMem does differently

GoodMem sits between the user's question and the model. On every query it does the work a senior data engineer would do if they had ten seconds to brief the model.

Index everything that matters. Schema, historical queries, data dictionary, business glossary, all into a memory store optimized for the client's domain.
Pick the right embedder. Finance schemas don't embed the same way as healthcare claims. Agent Tuner tunes for both.
Retrieve precisely. Agent Tuner reranks the handful of tables, columns, and prior queries by precision. The prompt gets the right context, not the kitchen sink.
Verify before execution. PAIRity validates the generated SQL against the schema and against safety rules before the query ever runs.
Learn from outcomes. Telemetry from every interaction (what worked, what failed, what the user accepted) feeds back into the next round of tuning.

Accuracy and cost-per-query improve measurably as the system tunes itself on real client traffic.

Architecture, in three sentences

The client's data warehouse stays where it is (Snowflake, Databricks, BigQuery, whatever). GoodMem sits on top as the memory layer; Agent Tuner tunes retrieval and reranking against the client's query history; verification fires before any query executes. The model can be OpenAI, Anthropic, or open-weight running in the client's VPC; the same memory and tuning loop runs against any model the client picks.

Proof point: Incorta

Incorta runs GoodMem in production for this exact pattern. We have measured improvements on the CRUMB benchmark (a public retrieval-quality test) and on Incorta's own customer query logs:

Double-digit accuracy gains on real customer queries vs. unoptimized baseline
Material MRR (mean reciprocal rank) improvement on retrieval against their schema
Cost-per-query down because GoodMem routes simple questions to smaller models

When a Slalom client asks whether this actually works, Incorta is the proof point. Stand up a reference call.

How the engagement is shaped

A 30 to 60-day pilot on a defined slice of the warehouse with a measurable accuracy target. BI moves fast: success is mathematical and the data team can validate inside a sprint. Slalom delivers discovery and integration. PAIR Systems provides the GoodMem platform and a narrow band of professional services to configure the optimization layer. If the pilot hits the target, the engagement converts to a 12- or 24-month GoodMem license alongside a multi-month services engagement on integration, training, and governance. The same license anchors expansion into knowledge-base search and ServiceNow modules.

The pilot is structured to de-risk the client. If the numbers aren't there, the client doesn't sign. That risk-shift is what lets a Slalom Account Manager open the conversation without asking the client to commit to a six-month consulting engagement upfront.

Discovery questions for the first conversation

What's the executive request behind your AI program: a specific use case or a generic "do AI" mandate?
How many tables and columns are in the warehouse the agent would query?
Do you have a library of SQL queries your analysts already wrote?
What's your current accuracy on natural-language queries, even informally?
Who owns this: the data team, the AI/ML team, or a business leader?

Business-leader-owned engagements close fastest. Anything below 80% accuracy on natural-language queries is a strong fit. Most clients are at 50 to 65%.

The one-line pitch the Account Manager walks in with

"Your CEO wants ChatGPT for your data. Slalom and PAIR Systems have done this before. We can stand up a 30 to 60-day GoodMem pilot with a measurable accuracy target on a defined slice of your warehouse, and if the numbers aren't there, you don't sign. Let's pick the slice."

"ChatGPT for our data," that actually works.