Virtue AI
RESEARCH TERMS

We conduct pioneering AI research to empower and ensure safe and secure AI.

Red Teaming & Risk Assessments

Pioneering comprehensive AI risk assessment across multiple sectors and languages. Our advanced red teaming algorithms rigorously test AI models and systems, ensuring robust safety measures aligned with global regulations.

Guardrail & Threat Mitigation

Developing cutting-edge, customizable content moderation solutions for text, image, audio, and video. Our guardrails offer transparent, policy-compliant protection with unparalleled speed and efficiency.

Safe Models & Agents

Crafting AI models and agents with inherent safety features, from secure code generation to safe decision-making. We’re integrating safety and compliance directly into AI development processes, setting new standards for responsible AI.

Publications

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Abstract: Large language models (LLMs) are shown to benefit from chain-of-thought (COT) prompting, particularly when tackling tasks that require systematic reasoning processes. On the other

COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits

Abstract: Conformal prediction has shown spurring performance in constructing statistically rigorous prediction sets for arbitrary black-box machine learning models, assuming the data is exchangeable. However,

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Abstract: Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction. Nonetheless, numerous limitations exist within existing public MSMO datasets, including insufficient

PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

Abstract: Personalized Federated Learning (pFL) has emerged as a promising solution to tackle data heterogeneity across clients in FL. However, existing pFL methods either (1)

ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles.

Abstract: We present ChatScene, a Large Language Model (LLM)-based agent that leverages the capabilities of LLMs to generate safety-critical scenarios for autonomous vehicles. Given unstructured

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

Abstract: Pretraining auto-regressive large language models~(LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented

Fair Federated Learning via the Proportional Veto Core

Abstract: Previous work on fairness in federated learning introduced the notion of core stability, which provides utility-based fairness guarantees to any subset of participating agents.

SHINE: Shielding Backdoors in Deep Reinforcement Learning.

Abstract: Recent studies have discovered that a deep reinforcement learning (DRL) policy is vulnerable to backdoor attacks. Existing defenses against backdoor attacks either do not

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding.

Abstract: While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a

Differentially Private Synthetic Data via Foundation Model APIs 2: Text.

Abstract: Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.

Abstract: Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the

Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing.

Abstract: Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing