Businesses of all sizes have been unable to escape the incredible impact AI has recently had on the way we do business.
From corporations to SMEs, organizations are becoming faster, more agile and more resilient as we outsource administrative and repetitive tasks to our AI colleagues.
One of the latest AI trends is the establishment of Large Language Models (LLMs) in the public domain: machine learning algorithms that are trained on huge amounts of data to recognize the structures and patterns of natural language. They are proficient in Natural Language Processing (NLP), which allows us to explore huge data sets through everyday questions or commands.
Therefore, LLMs are the most common method of making AI understandable — to take the most famous example, LLMs are the means by which ChatGPT can answer your questions. However, this intelligence has a conventional disadvantage: it is in a kind of time capsule.
LLMs are trained intensively, with millions upon millions of data points fired at them in a constant feedback loop to teach each model how to understand specific data points or patterns. But “operationalizing” an LLM – taking it out of the training loop and putting it online as part of your infrastructure – obviously prevents it from learning anything new. Even some of the early versions of ChatGPT will politely explain their own time constraints if you ask a question about very current events.
This means that you need to be sure that the LLM can rely on the systems it will examine and the data available to it. And while the corporate giant may have the funding and technology stack to make this possible, it’s a bold assumption for an SMB.
Move it or lose it
In the past, we tended to think of data as static. When the layperson downloads a file onto their PC, the file is not “there” until it appears in your documents, even if millions of individual bytes of data quietly assemble into something infinitely more complex.
With this mindset, you can understand why companies have often chosen to collect as much data as possible and only then determine what they have actually collected. Convention would have us dumping data into a giant data warehouse or lake, spending ages deleting and preparing that data, and then digging up various pieces for analysis — a method commonly known as batch processing.
That’s about as efficient as it sounds. Tackling an entire data set duplicates work, obscures insights, and imposes enormous demands on hardware and power consumption—all while delaying important business decisions. For SMEs looking for ways to make up for limited resources and staff, this method undermines the agility and speed that should be their natural advantage.
Since there was previously no need for information to be consumed or even captured in real time, this has not been a problem until now. But considering how many of the new companies’ end-customer value propositions are based on real-time data (think, for example, of hailing a taxi using Uber or a similar application, and imagine not seeing the “live” map with your driver’s location), this is now a must-have, not a nice-to-have.
Fortunately, LLMs don’t just work on a batch basis. You can interact with data in different ways — and some of those ways don’t require the data to stay still.
Ask and you shall receive
Just as disruptive SMBs seek to topple older and more established companies, data streaming is replacing batch processing.
Data streaming platforms use real-time data “pipelines” to collect, store and use data continuously and in real time. The processing, storage and analysis that batch processing has you waiting for can now be done suddenly and instantly.
In streaming, this is achieved through so-called event-driven principles, where essentially every change in a data set is treated as an “event” in itself. Each event contains a trigger to receive more data, creating a constant cascade of new information. Instead of having to retrieve data (usually stored in a table somewhere in a database), data sources “publish” their data in real-time at any time to anyone who wants to consume that data by simply “subscribing” to them. Data.
All of this can free LLMs from the distinction between training and operations. Additionally, it is possible for the LLM to train itself if actions can be performed on each data point. to use the correctness of its actions to continually refine the underlying algorithms that define its purpose.
This means that the LLM can draw on a constantly updated and curated data set, while constantly improving the mechanisms that deliver and contextualize this data. Data isn’t at risk of redundancy or left in a forgotten silo – all you have to do is ask for it!
Cut from the SME cloth
So what does this mean for medium-sized businesses?
On the one hand, it releases the proverbial handbrake. The sheer speed at which LLMs can deliver information across a stream-driven infrastructure allows decision makers to move business forward at the pace they desire without batch processing keeping them in second gear. The agility that enables SMBs to outmaneuver larger players is once again in abundance.
These decisions are made with less doubt and more relevant context than before. Thanks to the natural language that LLMs recognize, it is so easy to access specific insights that data streaming can inspire real enthusiasm for business transparency across the board.
Not only is output faster and more accurate, but SMBs can also free themselves from outdated technology. Data streaming can be entirely on-premises, entirely in the cloud, or a mix of both. The high-performance hardware often required for batch processing is simply no longer necessary when you can get the same result from an LLM in record time. Additionally, there are several providers offering fully managed (turnkey) solutions that do not require any capital investment from SMEs.
For SMEs to get the most out of LLMs, they need to think about the way they handle company data. When a business is willing to treat data as a continuous flow of information, it is in a much better position to maximize the potential of the data in motion to help it evolve.
Carlos Roman
Carlos is a passionate leader with over 25 years of experience launching cloud, software and hardware teams in emerging and mature markets. He specializes in helping emerging companies go beyond their ambitions. He joined Confluent in 2021 after being a key part of Oracle’s cloud sales teams for more than two decades.

