07 The end of the Data Warehouse

Q: You said the era of the data warehouse is over. Isn’t that a little bold?

Rick Schaffer: I don’t think so — I think it’s just accurate. Data warehouses were never the right approach to the data sharing challenge.

Q: Why is that?

RS: For a lot of reasons, but if I had to sum it up I’d say the whole paradigm of the business intelligence/reporting/relational database model of data sharing was fundamentally incapable of meeting the infinite needs of the user.

Q: Infinite needs? Isn’t that overstating it just a bit?

RS: [laughs] You’ve never developed software, have you?

Q: Uh, no. But surely within government there must be some conceivable point in time when all user needs could be met.

RS: Hmm, yeah that’s true. Let’s see, when exactly does hell freeze over? [laughs]

Q: Okay, I guess not.

RS: No, it’s never ending. The fact is reality evolves, organizations evolve, things change, needs change, and there is always a desire for greater efficiency. Needs are dynamic, and the data warehouse model was not a dynamic model — it’s too complex, too resource intensive, and frankly just too cumbersome.

Q: But isn’t the concept of the data warehouse that you simply gather all of your data in one place...

RS: You’ve already hit the first flaw. You can’t put the words “simply” and “gather” together when you are talking about a relational database model, which is what data warehouses are built on.

Q: Why?

RS: Because a relational database model, by design, requires a level of precision that doesn’t match the reality of sharing data. It demands that you “relate” all of the data, but that’s very difficult to do when you pull it in from different sources. Then ETL came along as the magic bullet to solve this problem.

Q: ETL? Wasn’t that a 70’s rock band?

RS: No, you’re thinking of ELP — Emerson, Lake and Palmer.

Q: No, I was actually thinking of ELO — Electric Light Orchestra.

RS: Okay, well anyway... ETL stands for Extract, Transform, and Load. Turns out it is mostly a load. [smiles] So it’s going to grab all this different data, transform it to standardized protocols, and then load it into your data warehouse and everything is going to talk to everything else.

Q: Sounds reasonable to me.

RS: Right. And then you have a team of analysts trying to resolve the data inconsistencies to make ETL work, and they are compelled to adhere to this perfect model because they are working within the relational database framework. And it’s so complex that by the time they’re done the needs have already evolved. Change happens quicker than results, and so you have this inefficient, endless cycle.

Q: So the DW paradigm is not terribly flexible or adaptable, by its very nature, is what you’re saying?

RS: Exactly. The inherent modeling demands perfection, and neither data nor reality are perfect.

Q: And your antidote to this problem, the Simpler model?

RS: Use the Web as the data sharing platform. The Web has obsoleted the data warehouse; people just haven’t realized it yet.

Q: Do you mean essentially replicate the data warehouse on the Web?

RS: No no no no no. The relational model goes out the window. Just publish the data to the Web, wholesale you might say. Just publish all the necessary datasets. Any relating that needs to be done is accomplished seamlessly on the user end, as they pull together what they need to get their work done.

Q: But then don’t you just have a bunch of unrelated data scattered willy-nilly? How are you sharing data from, say, six different systems if you’ve got six different datasets out there, or tougher yet multiple datasets from multiple systems?

RS: Because the sharing occurs at the point of need, instead of being pre-configured. The user simply picks the datasets they need from system A, system B, and so on — using our toolset. Then they combine and interrelate it according to their needs.

Q: And this approach is more dynamic than the data warehouse?

RS: About a thousand times. The data warehouse/relational database model has to pre-determine and “prepare” for every possible need. We turn that on its head — once you know what you need, then grab it. See, the relating and computing is done at the end of the process, as needed, not up front in a futile attempt to prepare for infinite needs.

Q: Is there an analogy that might help us understand the difference between the relational model and the Simpler model? What else works this way?

RS: Remember the early days of the Web, when America Online was the Internet Service Provider goliath?

Q: Of course. We loaded the software from a floppy, paid hourly, wasted time in chat rooms, and got a monthly email from Steve Case telling us AOL subscribership had grown another billion percent.

RS: Right. Well part of the reason for AOL’s early success was that they created this entire, finite virtual world for you. They served up all the content you “needed” in a nice, colorful, user-friendly platter.

Q: Looking back, they sort of bottled the Web for us, right?

RS: Yes, exactly. And that worked really well for a while. They were on the verge of taking over the world — at their peak they had over 30 million subscribers, which was huge. So what happened to AOL?

Q: Well, I think we all realized that there was an entire Web out there to explore, and that the bottled version AOL had given us was just a small part of it.

RS: Yes! AOL was like being confined to a four star hotel in Maui — it was really nice and comforting until we realized there was an entire island out there to explore, then we wanted out.

Q: I remember Earthlink’s tagline at that time: “It’s your Internet.” They were countering AOL by just giving you a browser and unfettered access to the Web.

RS: Yes, then eventually Google came along and gave us a good search tool, a starting place, and that was all she wrote. Today if you asked someone if they would let one company provide all of their Web content for them, they’d laugh at you.

Q: So in your analogy, I take it AOL is the data warehouse and Google is simpler/gov?

RS: [smiles] Well, if the GUI fits... It really is an accurate comparison. A data warehouse gives you pre-determined content and limited access, and the simpler/gov model gives you unfettered access. “It’s your data,” you might say.

Q: It also kind of reminds me of the old mainframe paradigm versus the personal computer paradigm.

RS: That’s a great comparison, too. In one you have a small group of experts controlling a slow, bloated, inefficient system, and in the other you have a wide, much more efficient dispersement of power and control. A lot more work gets done.

Q: So you sense the end of the data warehouse era, similar to how a lot of people sensed the end of the mainframe era when PC’s started taking off?

RS: Absolutely. Once anyone clearly understands the difference between the two approaches, it’s a no-brainer.

Q: So those who get this now, early on, will be in front of the pack?

RS: Yes. In this stage the visionaries and the thought-leaders and those who just have that special knack to be forward-thinkers are getting it. They will be the ones that instigate the change, scrap the new data warehouse plans and wave the banner for this newer, simpler and better way of getting the job done.

Q: Right. Who wants to be the guy that spearheaded the new multi-million dollar mainframe system — in 1991?

RS: [smiles] Not me. I think I saw that guy at a job fair the other day. Wearing an ELO shirt.

805.882.1848 info@simplersystems.com