The use of AI to write and maintain computer code has already become the industry norm – there can hardly be any programmers out there who are no seeing some advantages of AI. While there is still dispute how much the AI is actually helping or proving value, the situation is changing every week with advancements not only in the foundation LLMs, but in the various IDEs, AI tools and AI enabled editors.
Writing computer code is almost entirely a text and semantics problem – the areas that LLMs excel at, and producing code that is statistically similar to code that has been written before – I.e. recreating semantic and text patters that have already been published somewhere is will within the core capabilities of a LLM. Also remember that computer code has to fit within a tight set of semantic rules – the code must be machine readable, and this can easily be checked by dropping the code into an interpreter or compiler.

What is less mature is the use of generative AI on data. Ask any data professional, and they will tell you that the world of data is inherently messy. Data quality is generally poor (in fact one of my favourite saying is “however poor you think your organisation’s data quality is, the reality is very likely worse”. What makes the situation more challenging is that while computer code might be undocumented, the program flow can always be understood. The messy world of data is far less clear – tables, attributes and schemas may have confusing, obfuscated or simply incorrect names, and unpicking this mess is time consuming and expensive.
Using LLMs and generative AI in this data world is a very different problem. The purpose of this blog is to investigate and explore how AI can be used with data in databases, and where it should, and where it should not be used.
The world of generative AI is extremely exciting. It’s not a panacea for our data challenges but it can accelerate and enable uses of data that were previously nigh impossible.
Over the next few months, I hope to build a view of how generative AI can help data professionals and database professions in their day-to-day job. How to make them more efficient and more effective for their organisations.
I have been working with data and information management since 1988, using everything from mainframes to modern phones from 1980s network databases to modern AI-enabled vector database, and I’m exciting to look at how AI will change the world of data once more.
I currently work for Oracle in their database development organisation.
Leave a comment