In This Story
Three months after its formation, OpenAI’s new Safety and Security Committee is now an independent board oversight committee, and has made its initial safety and security recommendations for OpenAI’s projects, according to a post on the company’s website.
Zico Kolter, director of the machine learning department at Carnegie Mellon’s School of Computer Science, will chair the board, OpenAI said. The board also includes Quora co-founder and chief executive Adam D’Angelo, retired U.S. Army general Paul Nakasone, and Nicole Seligman, former executive vice president of Sony Corporation (SONY+0.55%).
OpenAI announced the Safety and Security Committee in May, after disbanding its Superalignment team, which was dedicated to controlling AI’s existential dangers. Ilya Sutskever and Jan Leike, the Superalignment team’s co-leads, both resigned from the company before its disbandment.
The committee reviewed OpenAI’s safety and security criteria and the results of safety evaluations for its newest AI models that can “reason,” o1-preview, before before it was launched, the company said. After conducting a 90-day review of OpenAI’s security measures and safeguards, the committee has made recommendations in five key areas that the company says it will implement.
Here’s what OpenAI’s newly independent board oversight committee is recommending the AI startup do as it continues developing and deploying its models.
“Establishing Independent Governance for Safety & Security”
OpenAI’s leaders will have to brief the committee on safety evaluations of its major model releases, such as it did with o1-preview. The committee will also be able to exercise oversight over OpenAI’s model launches alongside the full board, meaning it can delay the release of a model until safety concerns are resolved.
This recommendation is likely an attempt to restore some confidence in the company’s governance after OpenAI’s board attempted to overthrow chief executive Sam Altman in November. Altman was ousted, the board said, because he “was not consistently candid in his communications with the board.” Despite a lack of transparency about why exactly he was fired, Altman was reinstated days later.
“Enhancing Security Measures”
OpenAI said it will add more staff to make “around-the-clock” security operations teams and continue investing in security for its research and product infrastructure. After the committee’s review, the company said it found ways to collaborate with other companies in the AI industry on security, including by developing an Information Sharing and Analysis Center to report threat intelligence and cybersecurity information.
In February, OpenAI said it found and shut down OpenAI accounts belonging to “five state-affiliated malicious actors” using AI tools, including ChatGPT, to carry out cyberattacks.
“These actors generally sought to use OpenAI services for querying open-source information, translating, finding coding errors, and running basic coding tasks,” OpenAI said in a statement. OpenAI said its “findings show our models offer only limited, incremental capabilities for malicious cybersecurity tasks.”
“Being Transparent About Our Work”
While it has released system cards detailing the capabilities and risks of its latest models, including for GPT-4o and o1-preview, OpenAI said it plans to find more ways to share and explain its work around AI safety.
The startup said it developed new safety training measures for o1-preview’s reasoning abilities, adding that the models were trained “to refine their thinking process, try different strategies, and recognize their mistakes.” For example, in one of OpenAI’s “hardest jailbreaking tests,” o1-preview scored higher than GPT-4.
“Collaborating with External Organizations”
OpenAI said it wants more safety assessments of its models done by independent groups, adding that it is already collaborating with third-party safety organizations and labs that are not affiliated with the government. The startup is also working with the AI Safety Institutes in the U.S. and U.K. on research and standards.
In August, OpenAI and Anthropic reached an agreement with the U.S. government to allow it access to new models before and after public release.
“Unifying Our Safety Frameworks for Model Development and Monitoring”
As its models become more complex (for example, it claims its new model can “think”), OpenAI said it is building onto its previous practices for launching models to the public and aims to have an established integrated safety and security framework. The committee has the power to approve the risk assessments OpenAI uses to determine if it can launch its models.
Helen Toner, one of OpenAI’s former board members who was involved in Altman’s firing, has said one of her main concerns with the leader was his misleading of the board “on multiple occasions” of how the company was handling its safety procedures. Toner resigned from the board after Altman returned as chief executive.