The GenAI Building Blocks

Technology Foundations of Generative AI: Architectures, Algorithms, and Innovations

Technology Foundations of Generative AI: Architectures, Algorithms, and Innovations
  • Blog
  • 14 minute read
  • 31 Jan 2024

Written by Norbert Freitag, Michael Berns, Stefan Pühl and Tobias Gräber. Gen AI, or generative artificial intelligence, is a very special type of AI that can create new content or data based on existing inputs, such as text, images, video, or music. It sets itself apart from previous AI innovations by using Multimodal Large Language Models (MLLM) or Large Multimodal Models, which can process and generate different types of data in natural language and other modalities.

Gen AI can furthermore leverage massive amounts of unlabeled data from the internet, advanced computational power of GPU processors, and refined training methods to achieve breakthrough improvements in language prediction, document analysis, content generation, and other tasks.
Gen AI is already supporting human interfaces, software apps, and creative team capabilities across various business functions and industries, enabling convergence, augmentation, and automation of digital programs.

Fundamental Components of Generative AI Systems

To realize strategic convergence, augmentation, or automation with human-led generative AI, the following high level solution building blocks should be considered to address the aspects of the computation, capability, control, and connectedness for human-led generative AI driving new or already existing digital programs.

Illustration: Technology Foundations of Generative AI - PwC

End-User Interface, also called frontend, connects the user with the capabilities of the generative AI solution for a specific task within a use case. Various interfaces possible (e.g. chatbot, voice interaction, etc.).

Digital Platform Ecosystems consist of the existing IT landscape connected through APIs and the extended ecosystem based on solutions on the market from other vendors if needed to realize the task performed by the end user.

Orchestration Platform coordinates communication across different MLLM through an API gateway to a digital platform ecosystem and the corporate data to realize the use case based on an adaptive architecture.

Data Sources consist of multiple data sources to configure the MLLM and make it specific to the use case and task with additional fine-tuning, light prompting or Retrieval Augmented Generation (RAG) in the orchestration based on the enterprise and public data.

Multimodal Large Language Models (MLLM) build the foundation of content generation. Multiple MLLMs for different task- or data specific purposes as well as safety filters and guardrails of software controls to ensure compliance are possible.

Cloud computing infrastructure needed to run language models, other applications, and databases consisting of the virtual and physical components (e.g. storage, servers, networking etc.).

Modalities in the realm of GenAI

The language models, applications, and interfaces are multimodal - bringing language in combination with other media, especially images, graphics, video, and audio:

Text Generation
This involves creating coherent and contextually relevant text. Models are capable of writing essays, poems, and even code.

Image Generation
AI models can create images from textual descriptions, showcasing AI’s creativity.

Audio Generation
AI systems can generate music, sound effects, and even replicate human voices with remarkable accuracy.

Video Generation
AI can produce animated sequences or alter existing videos, a field that’s growing rapidly.

Data Synthesis
Generative AI can create synthetic data for training machine learning models, especially useful in domains where data is scarce or sensitive.

3D Model Generation
AI is used in generating 3D models for various applications, including gaming and architecture.

Language Translation
Advanced AI models offer real-time, context-aware translation services, breaking down language barriers.

Interactive Conversational Agents
AI can power sophisticated chatbots and virtual assistants, providing human-like interactions.

Each modality demonstrates the versatility and potential of Generative AI in transforming various industries and creative domains. However, to unlock the full potential of Gen AI the combination of multiple modalities is key.

The Evolution of (Generative) AI Technologies

The Inception and Early Days

1950s:
The concept of “artificial intelligence” is first introduced by Alan Turing. Early AI research focuses on symbolic methods and problem-solving.

1960s:
The development of the first neural networks marks the beginning of machine learning, although limited by the computational power of the time.

1970s:
AI faces its first winter due to inflated expectations and limited progress.

Revival and Expansion

1980s:
AI experiences a revival with the rise of expert systems, capable of mimicking the decision-making process of human experts.

Late 1980s:
The backpropagation algorithm revitalizes neural network research.

1990s:
Machine learning, especially in the form of decision trees, support vector machines, and simpler neural networks, becomes increasingly practical for solving real-world problems.

The Era of Big Data and Advanced Algorithms

Early 2000s:
The emergence of big data and increased computational power fuels AI advancement.

2006:
Geoffrey Hinton’s work on deep learning and deep belief networks ignites significant interest in deep neural networks.

2010s:
The explosion of data, further advancements in computational power, and improvements in algorithms lead to the rapid growth of deep learning.

Breakthroughs and Mainstream Integration

2011:
IBM’s Watson wins Jeopardy!, demonstrating advanced natural language processing capabilities.

2012:
AlexNet, a deep convolutional neural network, wins the ImageNet competition, revolutionizing the field of computer vision.

2014:
The development of Generative Adversarial Networks (GANs) by Ian Goodfellow enables new possibilities in generative models.

2015:
AlphaGo, developed by DeepMind, defeats a professional human player in the board game Go, a task considered extremely challenging for AI.

2016:
The introduction of reinforcement learning in various applications, including robotics and gaming, showcases AI's problem-solving capabilities.

Advancements in Language Processing and Autonomy

The evolution of Generative AI technologies, particularly following Google’s introduction of the Transformer model, marks a significant trajectory in AI development.

2017:
Google’s Transformer Model: Google introduces the Transformer model, revolutionizing natural language processing with its attention mechanism, laying the groundwork for future generative models.

2018:
OpenAI’s GPT: OpenAI releases the first Generative Pre-trained Transformer (GPT), showcasing the potential of transformers in language generation.

2019:
GPT-2: OpenAI unveils GPT-2, an improved version with larger training data and capacity, demonstrating more sophisticated language understanding and generation.

2019:
Google’s BERT: Google releases BERT (Bidirectional Encoder Representations from Transformers), emphasizing the importance of understanding context in language processing.

2020:
GPT-3: OpenAI introduces GPT-3, a significantly larger and more powerful model, setting new standards in language generation and versatility.

2021:
Google’s T5 and Facebook’s RoBERTa: Enhancements to Transformer technology continue with Google's T5, focusing on task-agnostic training, and Facebook AI's RoBERTa, an optimized version of BERT.

2021 and beyond:
Multimodal AI Systems: The advent of multimodal AI systems like OpenAI’s DALL-E, capable of generating images from textual descriptions, showcases the expanding capabilities of Generative AI.

2022 and beyond:
Specialized and Efficient Models: The focus shifts towards creating more specialized and efficient models, tailored for specific tasks or industries, while addressing issues of computational requirements and accessibility.

2022 and beyond:
ChatGPT – Bring Generative AI to the masses, extremely fast increase in global interest and distribution

Ongoing:
AI research pushes the boundaries of what’s possible, with advancements in quantum computing, AI-driven drug discovery, and more.

Throughout this journey, each technological advancement in AI has built upon the last, leading to exponential growth in capabilities. The AI field continues to evolve, driven by ongoing research, innovation, and a keen focus on ethical and responsible development.

The CTO perspective on GenAI technology

As a CTO, making informed choices about technology is crucial, especially when it comes to emerging fields like Generative AI. The potential of Generative AI to create realistic and creative content is immense, but it requires careful consideration.

Factors such as scalability, ethical implications, information security and data privacy must be weighed against the benefits of using Gen AI. By understanding the technology’s capabilities and limitations, CTOs can make strategic decisions that align with their organization’s goals and values.

Open Source, Foundation Models and the use of Hyperscalers

The following points can be used as a practical guide to choosing options for your business:

Overview and Comparison of Cloud based or self hosted
Comparing Generative AI in the context of Open Source, Foundation Models, Hyperscalers, and contrasting Cloud-based vs. Self-Hosted solutions requires a comprehensive understanding of each field:

Open Source Generative AI

  • Pros:
    Potentially lower cost, highest flexibility regarding training and embedding
  • Cons:
    Requires more technical expertise, limited support, potential for security vulnerabilities, domain dependent and can potentially be less accurate than foundation models without additional training.
  • Best For:
    Mid-sized to large businesses with technical capabilities, businesses seeking customization for optimized results.
  • Things to consider:
    The open source developer community might be stronger in certain aspects. Strong technical capabilities need to be available in the organization to design, build, implement and operate a complete pipeline.

Foundation Models

  • Pros:
    Advanced capabilities, ongoing development, support and updates, reliable
  • Cons:
    Can be expensive, less customizable (which is already changing as we type).
  • Best For: 
    Businesses needing cutting-edge capabilities without extensive AI development resources.
  • Things to consider:
    Foundation models offer a broad variety of access capabilities and ecosystems to work in context.

Hyperscalers

  • Pros:
    Scalable, robust, rich application & infrastructure ecosystem, strong support.
  • Cons:
    Cost can be high, risk of a potential vendor lock-in.
  • Best For:
    Large businesses or those needing scalable, reliable AI solutions.
  • Things to consider:
    Hyperscalers are offering access to either foundation or open source models or both. Depending on the amount of use-cases and the applicable technology options one or many (multi cloud) options are deemed necessary.

Cloud-Based Solutions

  • Pros:
    Easier to scale, lower upfront and operational costs, regular updates, strong support.
  • Cons:
    Ongoing costs, information security and/or data privacy considerations (i.e. EU Business with US cloud hosting only)
  • Best For:
    Businesses without large IT infrastructures or those needing scalability and flexibility.
  • Things to consider:
    Governance and risk topics need to be worked through before accessing cloud based solutions.

Self-Hosted Solutions

  • Pros:
    Full control over data and infrastructure, potentially most secure.
  • Cons:
    Upfront costs, requires in-house expertise, ongoing operational and maintenance costs.
  • Best For:
    Large enterprises with sensitive data, businesses with the capability to manage IT infrastructure.
  • Things to consider:
    Beside providing the highest amount of flexibility to any use case this kind of implementation needs the most effort to be taken by the company itself, including resources for development and operations.

Cloud-based solutions offer ease and scalability, while self-hosted options provide control and potential security benefits. Open Source is cost-effective but technically demanding, whereas foundation models offer out-of-the-box advanced capabilities. Hyperscalers provide integrated, scalable solutions.

In summary, choosing the best option depends on elements like the number of use cases planned and the governance (data protection etc.) required by the organization to indicate the demand. This must then be matched by the costs for supply, i.e. technical expertise in-house (or external), hardware/infrastructure in-house (or external) and ecosystems. 

Businesses must weigh these factors against their needs, capabilities, and resources to make the best choice.

Popular cloud AI vendors

Illustration: Popular Cloud Vendors 2023 - PwC
Illustration: Generative AI – Models and platforms - PwC

Conclusions: Recommendations for GenAI Operations

CTO’s guide to Addressing Scalability and Computational Efficiency

Organizational adaptability and challenges

Generative AI is a powerful software solution that can create new content and insights from data, such as text, images, audio, and video. It has immense potential for growth and productivity, as it can augment human capabilities, automate complex tasks, and converge different functions and domains. However, generative AI also poses significant challenges for scalability and computational efficiency, as it requires massive amounts of data, advanced computational power, and sophisticated algorithms to train and run large multimodal models. Therefore, organizations need to adapt their strategies, architectures, and processes to leverage generative AI effectively and responsibly.

One of the key aspects of organizational adaptability is to align the generative AI solutions with the existing digital platform ecosystems and industry clouds, which provide the customer and employee experience, the business model, and the software-as-a-service capabilities. This can help to integrate generative AI into the products and services, expose the intended behavior to other applications and devices, and orchestrate the complex dataflows to the underlying language models. Moreover, this can help to leverage the existing cloud computing infrastructure, which can provide the needed GPU power and availability for the generative AI models.

Another aspect of organizational adaptability is to address the control and governance of generative AI, which can ensure the quality, reliability, and ethics of the generative AI outputs and interactions. This can involve creating software and organizational controls for the user input and model output, such as filters, plausibility checks, summaries, and overruling cases. It can also involve establishing a generative AI architecture board and a generative AI monitoring office, which can structure, monitor, and innovate the generative AI solutions from a business and technical perspective.

By adapting to the impact of generative AI, organizations can overcome the scalability and computational efficiency challenges and unlock the opportunities for convergence, augmentation, and automation. However, this also requires a human-led, scenario-based strategy, which can identify the relevant use cases, stakeholders, and outcomes for generative AI, and balance the trade-offs between the costs, benefits, and risks.

Data Flows and Data Pipelines

In the context of Generative AI (Gen AI), data pipelines and dataflows are critical components.

Data Pipelines in Gen AI are sequences of processing steps through which data is transformed and transported for AI model training and deployment. This is essential for handling large volumes of data efficiently and effectively. However, this also includes data collection, cleaning, transformation, and loading processes.

In turn, dataflows refer to the movement and transformation of data through the pipeline. This is crucial for ensuring that the right data reaches the AI models in the correct format.

LLM selection and computational efficiency

In an ideal scenario, companies seeking the optimal Large Language Model (LLM) for specific applications would implement a comprehensive evaluation process, contrasting various facets of different models. This comparison might encompass a range of models, including commercial, open-source LLMs, or even the development of proprietary ones. Examples of such models for benchmarking include ChatGPT, LlaMA, Claude, PaLM2, Vicuna, MPT, among others.

Illustration: LLM selection and computational efficiency - PwC

Total Cost of Ownership
How much will it cost to use / develop, support and scale the LLM? This includes not just the direct costs of using the models (like licensing or subscription fees) but also indirect costs like computational resources needed and potential costs related to integrating the model into existing systems.

Illustration: LLM Application TCO - PwC

Adaptability
Define Specific Use Cases and Criteria: The next step is to clearly define the specific use cases for the LLM. This could range from customer service automation to content generation or complex data analysis. It’s important to note that the performance of LLMs varies depending on the specific task, ranging from chatbot assistance to solving coding or reasoning problems and the criteria should be picked accordingly. The chosen LLM should not only meet current needs but also be scalable for future requirements. The corporation should consider the model’s ability to handle increasing loads and the potential for ongoing improvements and updates from the provider.

In general there are three ways to modify or adapt LLMs:

In context only:
Foundation model only with no model modification
Task is achieved using prompt and context modification only

Fine-tuning:
LLM is frozen but task layers are modified
Model is adapted using input output pairs

Domain adoption:
Full LLM is updated
Model weights are adapted using large domain specific corpus

Essentially adaptability is asking the question whether the model can be adapted to many use cases, level of customization needed, ability to train, enhanced compute performance, easily upgraded over time, ease of use based on documentation, data science effort needed etc.

Task Performance
How well does the model perform for a specific task, domain or set of use cases. This can be done using public validation set frameworks and / or by metrics like F1 score, precision, and recall, as well as considering cost factors. Accuracy benchmarking involves measuring how well the LLM performs in terms of precision (the proportion of positive identifications that are actually correct), recall (the proportion of actual positives that were identified correctly), and the F1 score (the harmonic mean of precision and recall). The corporation should develop a set of tasks or queries that are representative of real-world scenarios for the LLM or use public frameworks like ARC, HellaSwag, MMLU, TruthfulQA and others.
Ideally the task performance / accuracy is balanced out with additional costs: gains of a model with more parameters against its additional costs due to size. If it offers significantly better performance in terms of F1 score, precision, and recall, it might justify a higher cost. However, if the improvement is marginal for the corporation's specific needs, the smaller existing model might be more cost-effective.

Ecosystem
Assessment of necessary software and infrastructure to support the LLM’s operation.

Safety and Security
Evaluating the model’s security in terms of data protection, intellectual property loss, and compliance with data privacy laws and ethical standards is vital. This includes scrutinizing the model for bias and fairness, crucial aspects in corporate decision-making.


Data Protection and Regulations

Generative AI is subject to ethical, legal, and regulatory requirements in Europe, especially under the GDPR and the proposed AI Act. Depending on the use case and the processing steps, generative AI may involve the processing of personal data in different scenarios, such as when training the AI model, when users input personal data, or when the AI model uses the input data for further training. If personal data is involved, the obligations of the controllers and processors the GDPR sets out are such as lawfulness, fairness, transparency, purpose limitation, data minimisation, accuracy, storage limitation, integrity, confidentiality, accountability, and data subject rights.

However, some of these requirements may be difficult to fulfill in practice, due to the complexity and opacity of generative AI systems. For example, it may be hard to ensure the accuracy and fairness of the data and the outputs, to prevent algorithmic bias and discrimination, to explain the logic and the impact of the automated decisions or profiling, or to apply the data minimisation and storage limitation principles.

The draft AI Act of which a said final text has been made available to the public just recently aims to establish a harmonized framework for artificial intelligence in Europe and to classify AI systems according to their risk. The regulation will in some areas impose even stricter transparency obligations for generative AI systems, such as the obligation to inform users when they are interacting with or exposed to AI-generated content.

It is worth noting that existing privacy laws around the globe such as the CCPA have comparable requirements and also that many countries aim for comparable AI legislation as the European Union does. The most relevant topics regulators aim to cover are:

  • Ensure Ethical and Responsible AI Development (e.g. in U.S., Canada, India, UK, Brazil, South Korea, UAE, Japan)
  • Establish High-Risk AI System Oversight (e.g. in U.S., Canada, UK)
  • Promote International Collaboration and Standards (e.g. in U.S., Canada, UK)
  • Invest in AI Research and Development (e.g. in U.S., Canada, India, UK, Brazil, South Korea, UAE, Japan)
  • Ensure Public Trust and Accountability (e.g. in U.S., Canada, UK, India)

How PwC can help

PwC, as a community of solvers, builds trust for human-led generative AI leading the impact on human interfaces, software apps, and creative teams with a strategy that focuses on software, investment, and risk. That approach enables organizations to drive the considerably sizable potential of generative AI to succeed with their specific convergence, augmentation, and automation opportunities for their digital programs.

Illustration: Leading Human-led Generative AI Initiatives - PwC

GenAI Readiness Assessment

Assessing the organizational readiness for leveraging GenAI at scale to achieve sustainable business values.

Trustworthy GenAI Strategy

Establishing a holistic corporate GenAI Strategy to transform motivation into real business value while ensuring a responsible and trustworthy usage.

GenAI Use Case Discovery

Identification of value-adding use cases as well as transferring those into a standardized and prioritized portfolio followed by strategic implementation.

GenAI Operating Model & Roles

Defining the target operating model and required roles for ensuring a successful integration, ownership and maintenance of GenAI solutions into enterprise organizations while planning.

GenAI Infrastructure & Data

Enhancing infrastructural depth, maximizing data value with regular monitoring, maintenance, and optimization of technical systems.

Awareness & Enablement

Aligning a holistic change management and education concept with the GenAI integration in order to upskill employees and to shape awareness across the organization as prerequisite for risk mitigation.

Alliance Ecosystem & Living experience

We at PwC are working closely with our established alliance partners like Microsoft and Google to access the latest environments and to bring AI to our clients.

Follow PwC India