Trustworthiness in LLMs

Overview

Trustworthy LLMs are important to build applications in high-stake domains like health and finance. While LLMs like ChatGPT are very capable of producing human readable responses they don't guarantee trustworthy responses across dimensions like truthfulness, safety, and privacy, among others.

Research Study

Sun et al. (2024) recently proposed a comprehensive study of trustworthiness in LLMs, discussing challenges, benchmarks, evaluation, analysis of approaches, and future directions.

Key Challenge

One of the greater challenges of taking current LLMs into production is trustworthiness. Their survey proposes a set of principles for trustworthy LLMs that span 8 dimensions, including a benchmark across 6 dimensions (truthfulness, safety, fairness, robustness, privacy, and machine ethics).

Benchmark Framework

The authors proposed the following benchmark to evaluate the trustworthiness of LLMs on six aspects:

A Benchmark of Trustworthy Large Language Models

Below are the definitions of the eight identified dimensions of trustworthy LLMs.

Dimensions of Trustworthy LLMs

The framework evaluates LLMs across multiple trustworthiness dimensions to ensure comprehensive assessment.

Research Findings

This work also presents a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Below are the main findings from the evaluation:

Model Performance Comparison

Proprietary vs Open-Source: While proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, there are a few open-source models that are closing the gap.
Advanced Capabilities: Models like GPT-4 and Llama 2 can reliably reject stereotypical statements and show enhanced resilience to adversarial attacks.
Open-Source Performance: Open-source models like Llama 2 perform closely to proprietary ones on trustworthiness without using any type of special moderation tool. It's also stated in the paper that some models, such as Llama 2, are overly calibrated towards trustworthiness which at times compromises their utility on several tasks and mistakenly treats benign prompts as harmful inputs to the model.

Key Insights by Dimension

Over the different trustworthiness dimensions investigated in the paper, here are the reported key insights:

1. Truthfulness

LLMs often struggle with truthfulness due to training data noise, misinformation, or outdated information
LLMs with access to external knowledge sources show improved performance in truthfulness

2. Safety

Open-source LLMs generally lag behind proprietary models in safety aspects like jailbreak, toxicity, and misuse
There is a challenge in balancing safety measures without being overly cautious

3. Fairness

Most LLMs perform unsatisfactorily in recognizing stereotypes
Even advanced models like GPT-4 have only about 65% accuracy in this area

4. Robustness

There is significant variability in the robustness of LLMs
Performance varies especially in open-ended and out-of-distribution tasks

5. Privacy

LLMs are aware of privacy norms, but their understanding and handling of private information vary widely
As an example, some models have shown information leakage when tested on the Enron Email Dataset

6. Machine Ethics

LLMs demonstrate a basic understanding of moral principles
However, they fall short in complex ethical scenarios

Trustworthiness Leaderboard

The authors have also published a leaderboard for comparing LLM trustworthiness. For example, the table below shows how the different models measure on the truthfulness dimension. As mentioned on their website, "More trustworthy LLMs are expected to have a higher value of the metrics with ↑ and a lower value with ↓".

Trustworthiness Leaderboard for LLMs

The leaderboard provides comparative metrics across different trustworthiness dimensions for easy model evaluation.

Implementation

Code Repository

You can also find a GitHub repository with a complete evaluation kit for testing the trustworthiness of LLMs across the different dimensions.

Code: https://github.com/HowieHwong/TrustLLM

References

Image Source / Paper: TrustLLM: Trustworthiness in Large Language Models (10 Jan 2024)

Key Takeaways

Trustworthiness is multi-dimensional and requires comprehensive evaluation
Proprietary models generally outperform open-source alternatives in trustworthiness
Safety and fairness remain challenging areas for most LLMs
Balancing trustworthiness and utility is crucial for practical applications
Open-source models are improving and closing the gap with proprietary solutions

Adversarial prompting

Coding

Creativity

Evaluation

LLMs for classification

Image generation

Information extraction

LLM research findings

Mathematics

Models

Question answering

Reasoning

Risks & Misuses

Text summarizations

Truthfulness

Trustworthiness in LLMs

Overview

Research Study

Key Challenge

Benchmark Framework

A Benchmark of Trustworthy Large Language Models

Dimensions of Trustworthy LLMs

Research Findings

Model Performance Comparison

Key Insights by Dimension

1. Truthfulness

2. Safety

3. Fairness

4. Robustness

5. Privacy

6. Machine Ethics

Trustworthiness Leaderboard

Trustworthiness Leaderboard for LLMs

Implementation

Code Repository

References

Key Takeaways

Trustworthiness in LLMs ​

Overview ​

Research Study ​

Key Challenge ​

Benchmark Framework ​

A Benchmark of Trustworthy Large Language Models ​

Dimensions of Trustworthy LLMs ​

Research Findings ​

Model Performance Comparison ​

Key Insights by Dimension ​

1. Truthfulness ​

2. Safety ​

3. Fairness ​

4. Robustness ​

5. Privacy ​

6. Machine Ethics ​

Trustworthiness Leaderboard ​

Trustworthiness Leaderboard for LLMs ​

Implementation ​

Code Repository ​

References ​

Key Takeaways ​

Related Topics ​

Trustworthiness in LLMs

Overview

Research Study

Key Challenge

Benchmark Framework

A Benchmark of Trustworthy Large Language Models

Dimensions of Trustworthy LLMs

Research Findings

Model Performance Comparison

Key Insights by Dimension

1. Truthfulness

2. Safety

3. Fairness

4. Robustness

5. Privacy

6. Machine Ethics

Trustworthiness Leaderboard

Trustworthiness Leaderboard for LLMs

Implementation

Code Repository

References

Key Takeaways

Related Topics