Virtue AI
RESEARCH TERMS

We conduct pioneering AI research to empower and ensure safe and secure AI.

Red Teaming & Risk Assessments

Pioneering comprehensive AI risk assessment across multiple sectors and languages. Our advanced red teaming algorithms rigorously test AI models and systems, ensuring robust safety measures aligned with global regulations.

Guardrail & Threat Mitigation

Developing cutting-edge, customizable content moderation solutions for text, image, audio, and video. Our guardrails offer transparent, policy-compliant protection with unparalleled speed and efficiency.

Safe Models & Agents

Crafting AI models and agents with inherent safety features, from secure code generation to safe decision-making. We’re integrating safety and compliance directly into AI development processes, setting new standards for responsible AI.

Publications

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

Abstract: The integration of new modalities into frontier AI systems offers exciting capabilities, but also increases the possibility such systems can be adversarially manipulated in

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

Abstract: Dataset distillation aims to minimize the time and memory needed for training deep networks on large datasets, by creating a small set of synthetic

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

Abstract: Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly prone to learning spurious correlations

SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

Abstract: Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT)

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Abstract: Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Abstract: Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against

Can Pruning Improve Certified Robustness of Neural Networks?

Abstract: With the rapid development of deep learning, the sizes of neural networks become larger and larger so that the training and inference often overwhelm

Shake to Leak: Amplifying the Generative Privacy Risk through Fine-tuning

Abstract: While diffusion models have recently demonstrated remarkable progress in generating realistic images, privacy risks also arise: published models or APIs could generate training images

Improving Privacy-Preserving Vertical Federated Learning by Efficient Communication with ADMM.

Abstract: Federated learning (FL) enables distributed resource-constrained devices to jointly train shared models while keeping the training data local for privacy purposes. Vertical FL (VFL),

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

Abstract: Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have recently demonstrated exceptional capabilities for generating high-quality content. However, this progress has

DP-OPT: Make Large Language Model Your Differentially-Private Prompt Engineer

Abstract: Large Language Models (LLMs) have emerged as dominant tools for various tasks, particularly when tailored for a specific target by prompt tuning. Nevertheless, concerns

Effective and Efficient Federated Tree Learning on Hybrid Data

Abstract: Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing