LLM and AI Agents: Solution Architecture Considerations

2025 and beyond are considered the era of AI LLM agents. There is lot of buzz in the tech world related to LLM (Large Language Model) agents. Investments in agent frameworks and agents is skyrocketing. The hype is building day by day now. Agents can do magical things. We are very close to AGI (Artificial General Intelligence). Etc.

LLMs are not as great and intelligent as advertised. For many applications, simple prompt is not enough. Couple of layers of software and intelligent management of prompt is needed. Agent fits in this space. Architectural considerations of using agents in larger application need thorough understanding and careful implementation. Only then, the magical intelligence will happen.

It is high time to look at this whole topic from overall system perspective. There are many articles going into specific low-level details of LLMs. But I have not seen any attempt to look at all layers of solution and bring them together.

That is exactly what I will do in this post. We will examine the following :

LLM’s ML Model
LLM with API
LLM Agent Layer
Overall Application

After brief review, we will put these together and understand the solution considerations.

Let us start with LLM’s ML model

LLM ML Model

LLMs are just “next-word-prediction” ML models. Since they memorized lot of vocabulary and context, they can predict the most appropriate token sequences. These tokens are decoded to give the output text.

In large language models (LLMs), a token is essentially a chunk of text that the model processes as a unit. Tokens can be whole words, sub-words, or even single characters, depending on the model’s tokenizer and language. When you input text into an LLM, the model breaks it down into these tokens to analyze and generate responses.

In this post, we will not get into model architecture. There are many articles online on Transformer and other architectural element. We will focus on deployment and design perspectives.

From deployment perspective, the ML model would consist of files like these:

Weights: pytorch_model.bin, model_weights.ckpt
Vocabulary: vocab.txt, merges.txt, tokenizer.json
Software Layer: config.json, generation_config.json, run_inference.py

As we know, the output text is generally good if prompt contains right context and clear instructions. It is also well known that output may not be good enough the first time.

If human is directly interacting with model, he can evaluate the response and alter prompt. The quality of output depends upon how well the user is applying mind and domain expertise. So he must refine the prompt based on latest response to get the best answer.

When applications interface with the LLMs and provide a service to end users, they need to bring their unique strength. These could be domain expertise, evaluation expertise, decisions on whether to refine prompt & get better response etc. Applications may also need to decide whether to rely on external world such as inhouse DB or internet to give refined prompts. In other terms, applications should mimic how a human will do the prompting.

Hence logic external to LLM’s model is very much necessary for real world applications where expert human is not directly doing the prompt.

The artificial intelligence comes by evaluating the output of ML model, evaluating against real world needs and refined prompting based on that.

LLM with API

From the programming interface perspective, LLMs would provide a package and API. The LLMs released since 2024 are more sophisticated compared to older ones. They have larger context window. Instead of just one input text field, they provide 3 parameters at least.

System role
Assistant/Agent role
User prompt

All of these together form the input text.

System role is kept in the beginning of the final prompt to LLM. It always gives the overall context. Hence it is always part of the prompt, if there are multiple prompts during the agent execution.

Assistant is used for passing text related to agent functionality and agent response. Depending upon the intent, some part of this may be retained or altered by the LLM client that is implementing the agent. This also has function passed to the LLM’s API for calling from LLM API if needed. The ‘function calling’ is a feature to let LLM get external information through a function call from the API software layer. It may be triggered by ‘trigger tokens’ in the output tokens generated by LLM. The software layer checks the output tokens for trigger criteria to call function passed by the assistant parameter. The function call helps in ‘guided generation’. One of the use case is ‘output format’. Ensuring JSON format of the output. You can find more on the application examples are in upcoming sections.

User prompt is the part that may change with every prompt.

All of the above create the final prompt for LLM while LLM itself is just a normal ML model which does not know about the agents, system role etc.

LLM Agent Layer

Now comes the agent

There are many tech companies separately funded to make powerful and intelligent agents.

Even with software layer enhancements of LLMs, still a lot cannot be achieved through one or two prompts. It needs more sophisticated work breakdown, evaluation of prompt & rewriting the user prompt if necessary. Also, it is needed to evaluate response and prompt again with refinement. Also retrieve relevant information from external sources. Depending upon response from LLMs and from external world, they need to take different decisions.

All of these are complex. To meet the product requirement, it needs a state machine just to manage LLM & related AI.

Thus, LLM agent has lot of work to do.

Retrieving: Retrieving relevant data from Vector DB to be provided within the prompt. In RAG applications, the context may have many documents with too many pages. Everything can not fit within the prompt. The retrievers might be needed to intelligently filter the relevant part.
LLM prompting: Since agent comes in when complex tasks with high expectations, the prompt may be dynamically created and run. Also main objective could be broken down to smaller prompts.
LLM function calling: The concept of function calling is explained earlier. The functions may be web search or other tasks to be called from within software layer of LLM. Agent can do a function call itself or facilitate passing the function to LLM’s API.
External tasks to be done at the level of agent. These could be web search, DB queries or REST API calls and many more.

Using the above functionality, very smart agents can be built. Here is an example from IoT and smart campuses domain.

Example Agent — Requirements: Smart Building Air Quality Agent

Smart buildings air quality sensor IoT device: It needs to enhance the sensor information with local weather information, translate it to another language, double check the overall message and then send to remote operator. In addition, the local node also needs to read big maintenance document, ensure that the sensor values don’t fall outside realistic range. If so, it also needs to add maintenance alert.

In the above, ‘Air quality sensor IoT’ device needs to do many things. Invoking LLM, passing a function to LLM for verifying language translation, doing web search, accessing a large maintenance document & answering from that etc.

Design of Agent — Concept Summary:

Before we continue with the example agent, we need to understand few concepts. To design an agent, the agent libraries provide lot of support. Some of these provide agent frameworks, vector DB and other pieces outside LLMs.

To create the agent, the agent framework provide GUI based tools too.

For example, LangChain provides this graph definition and execution ability. There is visual assistance to create the graph (state machine that governs the agent behaviour) and to run and debug the agents.

In this graph

Node: Each node represents agent action. The node represents a python function that carries out the main work of the node.
Edge : Edge connects the nodes. It is a python function through which control goes from one to another. Main job of edge function is to do the job of connecting nodes. i.e. it has the logic to decide which node needs to be executed next.
State : State is the object passed between nodes. The content of state depends upon the design of the graph. It could have LLM output, intermediate results, incremental result etc. The contents depend upon the way Graph is designed.

The work outlined in previous section such as LLM function calling, web search and others would go into nodes and edges.

The most important work to be done in the workflow should be in the nodes. The edges can also do some work if it is relevant for the edge. Mainly it depends upon the conceptual design of the graphs and the nodes of the agent.

The nodes could be of different types.

Main data processing task doing LLM prompt
Tool node querying external source or
Retriever retrieving relevant data from a vector DB.

Different libraries provide node type classes for these.

The agent would interface with LLM through the published APIs. From 2024, most LLMs provide large context window and separate parameters for system role, assistant and user roles.

With older models, the agent frameworks managed these within themselves and sent the combined prompt to LLM.

AI is only one part of the story. What happens to overall application integration?

Example Agent — Design of Smart Building Air Quality Agent

In previous sections we saw the concepts of agent nodes and edges. Here we do a rough design of agent. First, a high level summary of a crude work flow. Second, identify few nodes. Third, agent edges. We are not generating graph here. LangGraph requires paid component LangSmith. Hence we just stop at listing the graph details. Also the main goal of this article is not to illustrate the agent design specifics. But to bring out the solution considerations of the framework.

Smart Building Air Quality Agent : Workflow steps

Sensor reports air quality levels as below acceptance level. It gives levels of CO2, particles in air etc.
Maintenance step checks the levels of various levels like CO2 against thresholds in the sensor maintenance document. This is to ensure that sensor is not doing false alarm due to clogged filters or other sensor parts.
Weather node retrieves humidity, temperature, traffic conditions etc. This helps as auxiliary information to confirm the findings of sensor.
Combined data is flagged for a possible air quality issue. (Sensor fusion step). Alert is generated if thresholds are breached. At this step language translation is done through LLM prompt.
Verified alert message is sent to the operator. (Human in loop)
Based on human operator confirmation, it is sent to larger set of recipients.

Smart Building Air Quality Sensor : Agent Graph Nodes

#1Primary Sensor Node : Main sensor that detects multiple elements of air quality. Input: Sensor readings (CO2, particles in air etc.). Output: Structured sensor data. Tools Used: Preprocessing functions to validate raw data and normalize for processing.
#2 Maintenance check node: Input: Sensor input. Output: Maintenance specific readings such as filter dirty level. Also gives its own ‘plausibility level’ considering the sensor readings and maintenance related readings. Tools Used: Reading local maintenance features.
#3 Weather Information Node : Input: Location coordinates from local sensor. Output: Current local weather information (e.g., temperature, humidity, traffic). Tools Used: External API call for weather data retrieval.
#4 Sensor Fusion Node : Input: Inputs from both Primary sensor node and Weather information node. Output: Confirmed air quality output with explanation of supporting data . Tools Used: External API call for weather data retrieval.
#5 Translation Node : Input: Output of sensor fusion. Output: Overall air quality alert with human operator confirmation. Tools Used: HMI Human Machine Interface, integration such as REST API to send/receive alert data.
#6 Human Confirmation Node : Input: Output of sensor fusion. Output: Overall air quality alert with human operator confirmation. Tools Used: HMI Human Machine Interface, integration such as REST API to send/receive alert data.
#7 Alert Broadcast Node : Input: Operator approved alert. Output: Confirmation of broadcast to intended recipients. Tools Used: Broadcast APIs

Smart Building Air Quality Sensor : Agent Graph Edges

In the above graph, the primary sensor is connected to maintenance node through unconditional edge function. Likewise there is edge from maintenance node to weather info node. A conditional edge from #1 primary sensor, #2 maintenance and #3 weather info node will take to further nodes. These are sequentially connected.

The final step loops back to node #1, indicating continuous operation.

The above is the high level design of what an agent can do and how it can be approached. This example is inspired by the air quality issue faced by Delhi and rest of “Indo-Gangetic area” in Indian subcontinent.

Example Agent Design — Key Takeaways :

Main point is that agent design has lot of consideration for the overall application. It certainly has multiple prompts and quality check etc (not shown in above design). But there is huge consideration for overall functionality and external world interaction.

Hence the overall application considerations are a big part of agent design.

Overall Application

Remember that the overall solution is overall goal and commitment to the end user. It may have its own functionality and state machine. This brings up multiple ways of integration with LLMs.

Integration Option: Have dedicated Agent layer

One integration option is to follow the current trend. For AI agent, use the framework. Integrate the application with appropriate integration as per application needs. These could be REST API, local direct calls or other integration options of software layers.

Integration Option : Direction Integration without any Agent Framework

The second is to directly interface with LLM API. Just integrate with the LLM APIs and packages.

Considerations for using agent framework.

From what we explored so far, the following are the pros and cons of using agent framework, as opposed to directly integrating LLM’s API.

Pros – Using agent framework instead of merging the functionality directly into overall solution:

Decouples LLM and AI from rest of the application. Helps to do better talent management. Makes switching across LLMs easier.
Agent infrastructure improves productivity
LLM agent state machine would be pretty similar to other agents. Implementing agent functionality directly in application may end up complicating the application. The application state machine could become overly complicated and hard to maintain.

Cons – :

While making agent framework generic, the design is likely to involve more tokens. This directly increases cost.
Some frameworks are ‘sticky’. They end up dragging the entire application to expensive cloud hosting. If there is no clear path to revenue, ensure that there is enough cost runway for the project.
In some cases, agent framework may be an overkill. The functionality to be achieved could be accommodated within main application state machine. If the application is ok to directly interface with LLM’s API, it can work for many cases.

It is rapidly evolving area. Framework keeps getting updated every few days. It is better to be constantly monitoring the releases and changes.

Wrapping it up,

Overall, AI agents bring lot of opportunities to innovate. Some solutions were never possible before. They are now possible. AI agent is here to stay.

IMF AI Readiness Index is Misleading for India

Any talk of AI now feels so cliched. This one caught my attention and I thought it needs a commentary. This one is from IMF – International Monetary Fund. It is the ranking of governments on preparedness of governments with regard to AI.

IMF AIPI (AI Preparedness Index)

This ranking is from IMF. India’s ranking is very low. When India wants to be counted among top 5 or top 10 on every major metric, it comes as rude shock. Ideally Indians need not be concerned with this. It is just somebody else’s opinion. To be fair, IMF also has put this disclaimer in its ranking.

However, there is a deeper problem. When IMF publishes ranks, Indian govt takes note and gets into hyper active mode. It gives officials immediate target to improve the rating. The panic mode sets in. The govt officials start rushing to improve the ranking. The concept behind the ranking methodology will drive the direction of entire country. It will have far reaching consequences. Hence it is needed to review the foundation and methodology.

IMF Ranking Methodology

IMF AIPI index has the following parameters: digital infrastructure, human capital, technological innovation, and legal frameworks. The following table is from IMF’s own site.

If you think about these parameters, these are about AI adoption readiness. This also can be seen as AI impact indicator. If the segment already is digital, then both adoption readiness and impact would be high. If segment is not digital, both the readiness and risk (impact) also would be low. Hence, it does not make too much sense.

Problem with IMF Indicators : Example of Meat Industry

Let us take example of meat business. It is $3.4 Billion in 2020. The meat production and slaughterhouses have not been modernized and never been connected to IoT. If powerful lobbies continue to control this industry, the sector will not be modernized. The “inefficiencies” will remain due to lack of standardization and hence the jobs due to ‘inefficiencies’ will remain unaffected by AI. That means- there is nothing to panic.

But as per the indicators above, where will it rate this industry’s AI readiness? The ranking method will give extremely low rating to the preparedness of meat industry. So, animal husbandry and food department will get very low rating due to meat industry, though AI won’t be killing the jobs in meat industry. In fact, any attempt to get better rating, will result in job losses eventually due to AI.

Fallacy of Indices and Ranking

‘Study’ , index and ranking are often misleading than of practical help.

Looking back during the rise of big tech days- all kinds of indices of USA have been ranked high for at least 30 years. But in last 20 years, American society severely got affected with flight of jobs, death of local businesses etc. In hindsight, something was not right about ranking in general.

Now let us focus on AI. The biggest issue is that AI means different things to different segments. Also the nature of impact could be very diverse. Secondly, government is not one entity. It is organization of organizations that are meant to serve many kinds of multi-level organizations.

Coming to India, the preparedness of government and its various departments are very different than the preparedness of private sector employers. Among the private employers, prominent ones are tech services and non-tech services. This could again be different than the products and manufacturing sector. The society as a whole, is yet again different animal. Pinning everything down to one number won’t represent anything meaningful. Hence even if we look at govt’s preparedness, the SWOT would be very different for each department.

Interests, challenges, opportunities are very different for jobseekers, existing jobs under risk of AI impact, industries, society and government are very different. The IMF index does not have the right approach, at least for India. Moving in its directions will not help India.

Competence Needed for AI Impact Assessment

Impact assessment of AI needs solid understanding of AI and imagination of what it can do. At the same time, it also needs good AI / ML implementation experience. It also needs the mindset to go beyond AI and see the domain as-is. Hands-on experience will give very good footing on realistic view and also realistic timeline of when the risk or opportunities would materialize. Panic and rhetorical response may not be the best way.

Given the strength of India, it can come up with pragmatic framework to assess AI readiness.

Initial Steps Towards AI Readiness Index

Initially all the relevant domains need to be identified. These would roughly follow all the govt departments and the portfolios underneath them. HRD ministry, railways, defense… you name it. It would be require a specific focus.

AI readiness index should start with listing the major distinct domains and industries. Further to this, there shall be domain-wise SWOT (Strengths Weakness Opportunities & Threats). Further to that specific actions could be drawn. Thereafter, plans for the actions could be defined. This large canvas of SWOTS and plans can give much better view of AI readiness.

With proper weightage for each, a robust and realistic AI readiness index could be achieved. We at Manomaya AI Systems would be glad to be part of such initiatives.

ChatGPT: Implications for Knowledge Management In Organizations

‘I overnight switched to ChatGPT from googling’

‘I used to go to Stackoverflow.com everyday. It is weeks since I went to that site. When I posted questions, I had to wait for days before getting answers. Moreover, there are ‘high ranking’ users who keep commenting on the way I formed question than answering. Not anymore’.

There is euphoria

There is euphoria in tech industry about ChatGPT, just other rest of the segments. In domains of social sciences, politics and culture, we are aware of the ‘AI bias’ issues. But when it comes to tech, the facts can be verified. The ability to verify has caused massive shift to ChatGPT for technical answers.

At Manomaya, we keenly observed ChatGPT’s answers and performance in many technical and business domains. We observed the pattern of questions-answers in different kinds of questions – open popular technologies, advanced specialized domains, semi-proprietary topics where companies have advanced company-specific technologies improved over public technologies. And lot more.

The benefits of ChatGPT are well articulated in loads of articles and videos. The biases are also well discussed. However, the productivity improvement and the productivity loss is not really discussed well.

Here we dive deep

Want code? You got it.
Want to find defect? Explanation of logic, refactoring..? You got it all.

Most of our team is now addicted to ChatGPT. It is the go to place. If ChatGPT can not give the answer, only then the members are exploring remaining options such as stackoverflow.

This is great news for productivity. Or is it?

In past few days, we saw that in some cases, it still took several days to solve a problem. Since we use ChatGPT, it is supposed to be faster, right?

Here are some of the observations.

When we had better domain knowledge, we got better results. When we included the right domain-specific solution-specific keywords, it got drastically better.
We heavily use Google cloud and Azure. There are many many new updates since 2021. We get misleading answers for those. Since it is well known issue, we can live with it.
The believable confidence: It gives very confident answers some times. Some other times, it explicitly says it is not confident. This makes the confidence believable. So, we tend to trust when it is confident. If We are not competent to verify the answers, we are likely to go down a rabbit hole. So, in a way, this is a automation tool rather than a reliable guide. That means that – it can not substitute for the lack of competencies.
Last week a team member was arguing with me. I had an improvement suggestion for the code. He refuses to do it. His defense is clear – Because ChatGPT said it is correct. The concept of best practice is at risk. To prove and convince, it is going to take lot of time.

With google search, the members had to qualify the search results by going through the content, the domain authority of the source and so on. All of that is bypassed. Your team member has built blind faith and emotional bond with the bot. Only after trying for several days, the topic is going to come up for escalation levels.

Biggest issue is the loss of exploration habits and analysis skills.

The engineers who start in their career in ChatGPT era are at a disadvantage. Some of the tasks which required understanding and analysis, can be directly done by ChatGPT. Neither the employer, nor the customers would wait for the engineer to pick up domain knowledge and system design skills.

This poses two challenges to organizations

Over next years, the engineers who do not get to analyze and build at grassroot level, will move up the ladder. When faced with diverse set of challenges, their weak foundation may cause issues for the entire organization.
Internally, organizations may have solutions that are more effective. These will never be reused. The team members will keep repeating the same mistakes. The internal knowledge management will suffer.

Way Forward for Organizational Knowledge Management

Lot of companies use proprietary frameworks. These will continue to require using internal knowledge. When they hire engineers from outside who are wired to working with ChatGPT, they may struggle. The internal technologies are often documented in plain documents. If the habit of reading and writing documents is lost in talent market, it will create productivity crisis.

Even if an organization hasn’t built proprietary library, there is still proprietary code base. Particular design patterns and style may have been used. The way error handling, logging, the hierarchy of control flow and lot more may be standardized. Pasting ChatGPT output may violate lot of these. With stackoverflow.com, the engineers still need to think more before getting online solution.

Now the organizations have to start looking at building internal knowledge management bots. It need not be huge language models with billions of Machine Learning parameters.

Authentic AI research voices have stated that small model with specific focus may outperform ChatGPT or other large models. It needs to be trained with meaningful data, aligned with the purpose of the bot.

The hidden cost of using solutions with no recorded source can be quite high. It is not measured right now. Another area of productivity loss is the knowledge transfer and loss of knowledge due to employee attrition. Huge variety of issues and impacts happen due to lack of knowledge transfer.

But this cost can easily justify the ROI for specific internal bot.

Wrapping it up

It is up the organizations to start investing in this. They won’t regret this investment. The practices of knowledge transfer when there is employee churn also would get transformed with this. Domain chatbots will be an area of growth with explosive investment in coming years.

For Better AI, Think Outside the Model

Last year, we had a client who needed a test bench solution. The client is excited about AI. He has finished a elaborate program on machine learning himself. He now understands machine learning algorithms, neural networks, cost function, weights, biases, optimizers, data engineering and lot more.

The project was a test bench solution involving complex embedded hardware. The users are the testers. They are domain experts too. Main focus was performance of testing a particular signal. I won’t reveal whether it was a multimedia, biometric sensor, a simple sensor or some other complex sensor. The signal comes ‘up’ to the test logic of the test bench through complex layer of drivers.

I have nothing surprising in the diagram to show you. Test bench solutions tap inputs from multiple levels. Then they are used in the test logic to fulfil the needs of the test case. On this basic need, further features like reporting, import/export etc. are added to the test solution.

Here comes the AI. The AI enthusiast product owner proposes inserting AI into the solution, the senior management is skeptical. Sure. They have pressure to show something on AI for the annual report. But they also want real value out of the investment.

The issue with many evangelists of AI is that they talk too much about AI than about the solution and the specific value coming from AI.

Continuing with our story, the product owner somehow gets the approval for AI based tests project. It was only the beginning.

The Challenges

As the AI based tests were developed, the product experts start raising questions. They question both the reliability and value out of AI. The domain experts know a lot. They can explain how exactly current system works. When something does not work, they can pinpoint the issue. Sometimes, too much knowledge also results in rigidity. It may make them oppose anything new. More over, AI seems like enigma. With code, it is so easy to go through and understand the logic.

In our case, product owner, has two challenges- 1. Find good answers to the objections raised. 2. Influence the experts. If it is not possible, convince the management against the expert opinion.

Finding good answers to the AI value and the reliability needs a good team. If your AI team is obsessed only with machine learning, then there is a problem. In this case, we spent lot of time doing non-AI solution. We almost became domain experts. It was possible because we were not just into AI technologies. We indeed were a systems and solution team than a bunch of ML-fanatics.

The specifics of the solution is irrelevant. Key point is that the product experts and und users could see AI and non-AI results side by side. There was no reason to object. Their demands of having traditional solution is taken care. As bonus they could see AI solution too. Over time, they got used to seeing the benefits. The resistance gradually faded away.

The Solution

Both the challenges were taken care because the AI team worked as system engineering team that cared about solution first. It had the perspective from sensor, device driver and all the way until the application level micro services. It also cared enough to define the work flow of gathering data and semi-automation of labeling.

I am surprised that many data scientists see ML as exclusive distinct solution instead of seeing it as one of the building blocks of the overall solution. For effective AI, the business workflow, user workflow and the solution development workflow are well thought through.

Any AI project shall be seen as full fledged system engineering project that integrates many different workflows. Seeing it only as demonstration of data science competencies could lead to failure. This would do harm to data science initiatives across organization, not just in one project.

Give confidence that data science team is here to solve a problem, not just for the sake of data science. It goes a long way in gaining acceptance.