AI Testing Nonprofit Flags Reliability Gaps Amid Push for National Standards

Former Meta executive said foreign actors may shape training data to influence model outputs.

AI Testing Nonprofit Flags Reliability Gaps Amid Push for National Standards
Photo of Campbell Brown (left), co-founder and CEO of Forum AI, speaking with Michal Lev-Ram (moderator) at the AI in America Summit in Washington on Wed., Dec. 3, 2025.

WASHINGTON, Dec. 3, 2025 — A nonprofit focused on testing artificial intelligence models said Wednesday that generative systems remained unreliable in high-stakes situations and required independent accuracy standards.

Campbell Brown, co-founder and CEO of Forum AI and a former Meta executive, told The Hill’s AI in America that large language models still struggle to separate credible reporting from unverified material, sometimes elevating think-tank analyses and Reddit threads with equal weight.

Brown said the systems “have a hard time finding signal in the noise,” a weakness she said can mislead teenagers seeking mental-health guidance or adults relying on AI during major events.

Brown said her organization developed expert-validated benchmarks to evaluate model performance in politics, geopolitics, health, and personal finance.

She said the effort aimed to create systematic measures of reliability as AI systems become embedded in search tools, productivity platforms, and decision-support software.

Brown warned that foreign actors could try to shape model responses by producing content designed to influence what AI systems surface first.

She said tactics once used to manipulate social-media algorithms are now being aimed at model training data and real-time outputs, creating heightened risks during elections or geopolitical crises when users expect accurate and up-to-date information.

Brown said major model companies recently declared their systems unbiased amidst recent scrutiny of “woke AI”  based only on their own internal reviews, a practice she called unsustainable as AI becomes embedded in critical decisions. She added that any federal framework should require companies to disclose the datasets models rely on, how they test for accuracy, and how performance changes after major updates. 

Brown concluded that independent auditing would be necessary as AI systems expand, adding that sectors affecting millions of users “should not be allowed to regulate themselves.”

Member discussion

Popular Tags