GPT-4 > GPT-3.5-turbo: GPT-4 exhibited a higher correlation with expert assessments in all conditions, demonstrating superior comprehension of financial language.
Bias Issue: GPT models tended to overestimate positive sentiment in corporate disclosures.
Agreement with Experts:
Cohen’s Kappa: 0.352
Simple Agreement Rate: 68% → Considered a relatively reasonable level.
Optimal Condition: GPT-4 performed best under the condition where positive sentiment assessments were adjusted downward.
Opinion
This study confirms that LLMs can achieve a certain level of reliability in sentiment analysis of Korean corporate disclosures. Notably, GPT-4 outperformed GPT-3.5, highlighting the potential of using the latest LLMs for financial data analysis. However, the tendency to overestimate positive sentiment necessitates adjustments for practical applications. Additionally, the inability to leverage external data and conduct quantitative analysis is a clear limitation, which must be addressed in future financial AI model development. Ultimately, GPT-4-based sentiment analysis can serve as a valuable supplementary tool but cannot fully replace expert evaluations.
Core Sell Point
GPT-4 proves useful in sentiment analysis of Korean corporate disclosures but requires further refinements due to its bias and data utilization limitations.
"Large Language Models for Semantic Monitoring of Corporate Disclosures: A Case Study on Korea’s Top 50 KOSPI Companies" This paper explores the feasibility of automating sentiment analysis of corporate disclosures from Korea’s top 50 KOSPI-listed companies using large language models (LLMs) such as OpenAI’s GPT-3.5-turbo and GPT-4.
1. Key Objectives
Assess the effectiveness of LLMs in analyzing the sentiment of corporate disclosures.
Compare the performance of GPT-3.5-turbo and GPT-4 models.
Identify challenges and limitations in sentiment analysis using LLMs.
2. Research Methodology
The top 50 KOSPI-listed companies were selected as of June 28, 2023.
Monthly summaries of corporate disclosures from January 1, 2022, to May 31, 2023, were collected.
GPT models were instructed to evaluate key factors such as financial stability, market share, and growth potential.
Sentiment assessments generated by GPT models and financial experts were compared using a scale from 1 (highly negative) to 5 (highly positive).
Inter-rater agreement was measured using Cohen’s Kappa statistic and simple agreement rate.
The correlation between GPT models and human evaluations was analyzed using the Spearman correlation coefficient and Kendall rank correlation coefficient.
3. Key Findings
Superior Performance of GPT-4: GPT-4 exhibited a stronger correlation with human evaluations than GPT-3.5-turbo in all conditions, indicating its superior ability to comprehend and assess complex financial language.
Optimal Condition: GPT-4 achieved the highest performance across all evaluation criteria under Condition 2, which applied adjustments to reduce overly positive sentiment assessments.
Sentiment Bias: The study found that GPT models tended to overestimate the positivity of corporate disclosures, suggesting a natural inclination toward interpreting content favorably. This bias needs to be mitigated through further refinements. However, the study also highlights that these limitations can be addressed through additional adjustments.
Agreement with Expert Opinions: Cohen’s Kappa statistic was recorded at 0.352, and the simple agreement rate stood at 68%, indicating a relatively reasonable level of agreement with expert assessments.
4. Limitations & Challenges
LLMs lack background knowledge of the analyzed companies, limiting the depth and contextual understanding of their analysis.
They are unable to incorporate external data sources such as financial data tables or news articles.
They have limited capability in understanding and executing complex financial formulas or advanced statistical analysis.
LLM performance may fluctuate over time, leading to inconsistencies.
This study demonstrates that LLMs, particularly GPT-4, have significant potential in conducting sentiment analysis on Korean corporate disclosures. However, further research is needed to address the limitations of LLMs, mitigate biases, enhance contextual understanding, integrate external data sources, and improve mathematical analysis capabilities. This research contributes to a better understanding of LLMs’ capabilities and limitations in real-time sentiment monitoring, providing valuable insights for both academia and industry practitioners.
[Compliance Note]
All posts by Sellsmart are for informational purposes only. Final investment decisions should be made with careful judgment and at the investor’s own risk.
The content of this post may be inaccurate, and any profits or losses resulting from trades are solely the responsibility of the investor.
Core16 may hold positions in the stocks mentioned in this post and may buy or sell them at any time.