Researchers from startup Patronus AI found that large language models (LLMs), like ChatGPT, are incapable of analyzing questions from Securities and Exchange Commission (SEC) filings.
AI Models Have a Long Way to Go
The researchers fed questions to AI models like OpenAI's GPT-4 Turbo and found that the model was able to correctly answer only 79% of the time. They also found that LLMs would either refuse to answer or "hallucinate" figures and facts that were not on the filings.
SEC filings are full of important data, varying from numbers and facts that should be analyzed accurately. Most of the AI models advertised to companies have a promising feature that can perform analysis on the financial narratives of the companies.
"That type of performance rate is just absolutely unacceptable," Patronus AI co-founder Anand Kannappan said. He also added that the rate needs to be higher for it to perform in an automated and production-ready way.
Patronus AI's Research
During the research investigation, Patronus AI worked on more than 10,000 questions and answers based on the SEC filings that have been publicly traded. To further test the advancement of the AI models, some questions require light math or reasoning.
The test was done using Open AI's GPT-4, GPT-4 Turbo, Anthropic's Claude 2, and Meta's Llama 2. All AI models were given various types of tests like closed book, long context, and more.
While the result was fairly below the expectations of the researchers, they remain hopeful that the future models will get better over time. "But today, you will definitely need to have at least a human in the loop to help support and guide whatever workflow you have," Kannappan stated.
Related Article : How AI Reshaped Tech Industry in 2023, Explained