Data Modeling

Mathematical and Computational Data Modeling
0
Citations
3.7k
Views
20
Articles
Your new experience awaits. Try the new design now and help us make it even better
Switch to the new experience
Figures and Tables
RESEARCH ARTICLE   (Open Access)

Does Financial News Sentiment Really Move Stock Prices? A Machine Learning Investigation Using Sentiment Analysis and LSTM Forecasting

Iffat Jahan

+ Author Affiliations

Data Modeling 3 (1) 1-8 https://doi.org/10.25163/data.3110810

Submitted: 25 October 2022 Revised: 10 December 2022  Published: 20 December 2022 


Abstract

Whether news sentiment genuinely moves markets, or merely appears to in hindsight, remains a surprisingly unsettled question — one this study revisits empirically rather than theoretically. Stock market behavior has long resisted prediction from historical price data alone, partly because unstructured signals such as financial news and social commentary inject volatility that purely numerical models tend to miss. Drawing on a large labeled corpus of financial headlines alongside seven years of company-level closing-price data, we built a two-stage system: first, a sentiment classifier (trained using Naive Bayes and Support Vector Machine algorithms) that scores news as positive, negative, or neutral; and second, a Long Short-Term Memory network that forecasts next-day price movement using both that sentiment score and historical closing price. The sentiment model reached roughly 75% accuracy — respectable, though not without room to grow — and, more tellingly, the forecasting model that incorporated sentiment outperformed the price-only baseline, suggesting sentiment is not just incidental noise but a meaningful predictive input. Correlational analysis reinforced this, showing closing price and news sentiment moving together on most observed trading days. These findings sit comfortably alongside, and to some extent extend, prior evidence from other exchanges and modeling approaches. What emerges is a cautious but fairly clear takeaway: text-derived sentiment adds real, if still imperfect, value to quantitative stock forecasting, and refining sentiment accuracy may be the most direct path toward sharper predictions.Keywords— Stock market prediction, financial news sentiment, Sentiment analysis, Long Short-Term Memory (LSTM), Support Vector Machine, Naive Bayes, Time series forecasting

I. Introduction

The stock market is a very important part of a country's financial sector. It creates great opportunities for investors and companies. However, the stock market is volatile and challenging to predict. Predicting the trends in stock market behavior is vital for stock traders to make correct decisions about whether to sell, hold, or buy other stocks. To earn profits, it is important for stock traders to buy stocks that are expected to increase in value in the near future and sell those stocks whose prices are expected to decrease. Stock prices can be influenced by external factors, such as social media and daily financial news, in either a positive or negative way at once. These factors must be considered for accurate stock market prediction. In this study, financial news sentiment is used to predict market behavior in order to make the best decision possible when purchasing or selling stocks to maximize profit. This is challenging because market sentiments are unpredictable and are impacted by various factors, including politics, the global economy, and investor expectations. Unstructured data, such as financial news articles or headlines, are used to extract the required information. Stock-related data was collected from the Dhaka Stock Exchange to examine the proposed system, which was built for predicting stock market movement based on news impact. This paper proposes (1) a sentiment analysis model that classifies news-based emotions as negative, neutral, or positive, since the collected news is in raw text form, and (2) a method that combines historical stock price data with the news sentiment of a specific company to predict future stock movement.

Stock market prediction is a very challenging task, but recent studies have shown that news article sentiment and stock price movement are correlated. Financial news sentiment analysis plays a vital role in this process; more accurate sentiment analysis leads to a more accurate stock prediction model. Existing studies on the sentiment analysis of financial news have found that FinBERT outperforms Naive Bayes and BERT classifiers (Syeda, 2022). Naive Bayes (NB) and Support Vector Machine (SVM) have also been shown to provide good accuracy (Pavitha et al., 2022). Beyond these supervised approaches, lexicon-based methods have also been explored fairly extensively as a lighter-weight alternative — Akter and Aziz (2016) applied a lexicon-based approach to sentiment classification on Facebook group data, and Bonta, Kumaresh, and Janardhan (2019) more broadly surveyed lexicon-based sentiment analysis techniques, concluding that while such methods avoid the need for labeled training data, they tend to trade away some of the accuracy that supervised classifiers can achieve. Sentiment classification itself is not without its complications, either; Arif, Li, Iqbal, and Liu (2018) showed that short, informal text — not unlike the brief, headline-style language common in financial news — poses particular difficulty for sentiment and spam detection alike, a concern that is worth keeping in mind given the headline-based news corpus used in the present study.

Mohan et al. (2019) used five years of closing stock prices for Standard and Poor's 500 companies, together with news articles collected for those companies from international daily newspaper websites between February 2013 and March 2017. They built prediction models based on time series forecasting approaches such as ARIMA, RNN, and Facebook Prophet, and found that RNN outperformed both ARIMA and Facebook Prophet, with a correlation observed between textual information and stock price direction. However, their models were less successful when stock prices were low or highly volatile. Li, Wu, and Wang (2020) used stock data from the Hong Kong Exchange (HKEx) covering January 2003 to March 2008 to examine the effectiveness of incorporating news sentiment into stock prediction. They found that LSTM outperformed multiple kernel learning (MKL) and SVM in both prediction accuracy and F1 score, and that financial news sentiment analysis improved prediction accuracy. Li, Xie, Chen, Wang, and Deng (2014) showed that, at the stock, sector, and index levels, models incorporating sentiment analysis outperform bag-of-words models on both validation and independent test sets. However, they noted that focusing only on positive and negative sentiment dimensions is not sufficient to achieve perfect predictions, as models relying solely on sentiment polarity did not perform well across all tests. Shynkevich, McGinnity, Coleman, and Belatreche (2015) explored how financial forecasting could be improved by combining news articles of varying relevance to the target stock using a multiple kernel learning technique that integrates information from five categories of news articles, divided by sector and industry. Their experimental results showed that using all five news categories simultaneously improved prediction performance compared with using fewer categories of news.

A separate but related thread of work has looked beyond traditional news media toward social media as a sentiment source. Batra and Daudpota (2018) integrated StockTwits data with sentiment analysis and reported improved prediction of stock price movement when social sentiment was added to the feature set, while Derakhshan and Beigy (2019) similarly found that sentiment extracted from stock-focused social media platforms contributed meaningfully to price movement prediction. This pattern is not unique to financial markets, either — Chauhan, Sharma, and Sikka (2021) observed that social media sentiment has likewise proven useful in predicting outcomes in an entirely different domain, election forecasting, lending some support to the broader idea that sentiment signals extracted from public text, regardless of the specific platform or topic, tend to carry genuine predictive information rather than just noise. Biswas, Ghosh, Chakraborty, Roy, and Bose (2020) extended this line of inquiry specifically to economically turbulent conditions, examining the scope of news-based sentiment analysis for stock market and GDP prediction during periods of struggling economic performance, and Eck, Germani, Sharma, Seitz, and Ramdasi (2021) likewise classified financial news articles as a basis for predicting broader stock market performance, reinforcing the idea that the news-to-price relationship explored in the present study has been observed across a range of framings and market contexts.

On the forecasting side, LSTM-based architectures have continued to dominate recent work, often in combination with sentiment or under unusual market conditions. Chou, Park, and Chou (2021) combined sentiment analysis with LSTM to predict stock closing prices in the aftermath of the COVID-19 pandemic, while Budiharto (2021) applied LSTM forecasting to Indonesian stock prices specifically during the COVID-19 period, both finding that the pandemic's disruption to ordinary market behavior did not prevent LSTM-based approaches from producing reasonably useful forecasts. Dutta, Pooja, Jain, Panda, and Nagwani (2021) took a slightly different route, proposing a hybrid deep learning approach to stock price prediction that combined multiple architectural elements rather than relying on LSTM alone, and Elena (2021) explored an alternative algorithmic angle entirely, using XGBoost together with sentiment analysis to predict the directional movement of the OMXS30 index — a reminder that LSTM, while popular, is far from the only viable modeling choice in this space. Underlying much of this forecasting work is the basic statistical character of stock price series themselves; Chaudhuri, Mukherjee, Chowdhury, and Sadhukhan (2018) examined the fractality and stationarity properties of stock market data, findings that help explain why models such as LSTM — built to handle complex, non-stationary temporal dependencies — have become a natural fit for this kind of forecasting task rather than simpler linear alternatives.

2. Methodology

2.1 Study Design and Analytical Framework

We approached this problem as a two-stage predictive modeling task rather than a single end-to-end pipeline, and that distinction matters for how the rest of this section is organized. The first stage concerns itself purely with language: classifying the emotional valence of financial news text. The second stage is concerned with time: forecasting next-day stock price movement using both historical price and the sentiment signal produced in stage one. We deliberately kept these stages separable and independently evaluable, partly because it lets us diagnose where errors originate — a weak sentiment classifier versus a weak time-series model produce very different failure signatures — and partly because it mirrors how most of the prior literature in this space has framed the problem (Li et al., 2014; Li, Wu, & Wang, 2020; Mohan et al., 2019). Figure 1 summarizes the overall architecture, and the subsections below walk through each component in the order data actually flowed through the system: acquisition, preprocessing, feature construction, model training, and evaluation.

2.2 Data Sources and Acquisition

Because no sentiment-labeled dataset existed for our target company's news coverage, we first needed a separate, independently labeled corpus large enough to train a general-purpose financial sentiment classifier. For this purpose we used the India Financial News Headlines Sentiments dataset, retrieved from Kaggle (available at https://www.kaggle.com/datasets/harshrkh/india-financial-news-headlines-sentiments). We chose this corpus for two practical reasons: its scale, and its provenance. It contains more than 200,000 financial news headlines spanning 2017 through 2021, each pre-labeled by sentiment polarity, which gave us enough volume to train a supervised classifier without immediately running into data scarcity problems — a concern that is not trivial in financial NLP, where labeled corpora tend to be small and domain-specific. From the full dataset we retained three fields relevant to our task: the sentiment label, the headline title, and the publication date. After filtering to the positive and negative classes used for binary classification, the working corpus comprised 92,383 positive headlines and 108,118 negative headlines — a class distribution that, while not perfectly balanced, was close enough that we did not apply additional resampling.

2.3 Target-Company News Corpus

The sentiment classifier trained above is only useful if it can be applied to news about the specific company we are trying to forecast, and that corpus had to be built separately. We constructed a custom web scraper to collect news headlines from a recognized financial news portal, restricting collection to outlets with an established editorial track record in order to limit the risk of incorporating fabricated or low-credibility "fake news" into the sentiment pipeline — a concern that, anecdotally, comes up often enough in financial NLP work that we felt it warranted explicit mention rather than assuming source quality. This process yielded a corpus of more than 500 news items pertaining to the target company.

2.4 Historical Stock Price Data

Daily historical price data for the target company were obtained from a publicly available Kaggle repository covering Apple Inc. (ticker: AAPL), spanning a seven-year period. We used Apple as the initial test case for the proposed pipeline, partly because of data availability and partly because a heavily traded, news-saturated stock offers a reasonably stringent test of whether sentiment signal can be extracted at all. The raw download included multiple price fields (open, high, low, close, volume); following standard practice in related forecasting work (Mohan et al., 2019), we retained only the daily closing price as the price-based feature for downstream modeling, since closing price is the figure most consistently used across the literature for end-of-day movement prediction and avoids the added noise of intraday fluctuation.

2.5 Data Preprocessing

2.5.1 Text Preprocessing for Sentiment Analysis

Before any classifier could be trained, the raw headline text needed to be normalized — this step is unglamorous but, in our experience, disproportionately affects downstream accuracy. Two preprocessing operations were applied to every headline in both the training corpus and the target-company corpus: conversion of all text to lowercase, which reduces token sparsity that would otherwise arise from inconsistent capitalization (e.g., treating "Stock" and "stock" as distinct tokens), and removal of common English stop words, which allows the classifier to concentrate its discriminative capacity on lexically meaningful terms rather than high-frequency function words that carry little sentiment information.

2.6 Feature Engineering for Stock Prediction

Following sentiment classification (described below), each headline in the target-company corpus was assigned a predicted sentiment label, and these labels were then aggregated to the daily level: for every trading day, we counted the number of positive, neutral, and negative news items published. This daily aggregation was necessary because the price-prediction model operates on a daily time step, while the raw news data arrives at an irregular, sub-daily frequency — aggregating sentiment counts per day is what allowed us to align the two data streams into a single coherent feature table. The resulting integrated dataset comprised four features per day, summarized in Table I: closing price, compound sentiment polarity, and the confidence scores associated with positive and negative news classifications, alongside a neutral confidence measure.

2.7 Feature Extraction for Text Representation

Raw text cannot be fed directly into a machine learning classifier, so a numerical representation step was required to bridge the gap between unstructured language and the vector-based input that supervised models expect. We considered three families of text-vectorization approaches for this purpose.

Term frequency–inverse document frequency (TF-IDF) was used as a candidate representation; this statistic captures how important a given word is to a document, with the underlying intuition that a term's weight should increase with its frequency within a document but be discounted if it appears ubiquitously across the entire corpus (Su et al., 2011). We also used the CountVectorizer implementation from the Scikit-learn package (Géron, 2022), which converts text into a numerical vector based on simple frequency counts of each token, providing a more straightforward bag-of-words baseline against which the TF-IDF representation could be compared.

In addition, we considered embedding-based approaches — specifically Word2Vec and FastText — both of which pursue the same underlying objective of learning dense vector representations for words, but differ in how they handle the substructure of language. FastText extends the Word2Vec approach by incorporating subword (n-gram) information, which comes at the cost of longer training time, owing to the substantially larger number of n-gram units relative to whole words, but in exchange offers improved handling of rare or out-of-vocabulary terms — a property that is particularly relevant to financial headline text, where novel company names, ticker symbols, or industry-specific terminology routinely fall outside a fixed vocabulary.

2.8 Sentiment Classification Models

Table I. Feature Description. These four features—closing price, compound sentiment, and the positive, negative, and neutral confidence scores—constitute the input feature set for the Long Short-Term Memory (LSTM) stock prediction model described in the Methods section. Sentiment-derived features were generated by aggregating day-level outputs of the Naive Bayes and Support Vector Machine classifiers across all news items published on a given trading day.

Feature

Meaning

Price

The closing price of a company

Compound

Polarity of news sentiment

Positive

Confidence of positive news

Negative

Confidence of negative news

Neutral

Confidence of neutral news

 

Fig. 1. Architecture of the Proposed Stock Movement Prediction System. Schematic overview of the end-to-end system pipeline. Unlabeled financial news text is first passed through the sentiment analysis model, which outputs a categorical sentiment label (positive, negative, or neutral) for each news item. These outputs, together with daily closing price data obtained independently from stock market records, are combined into a unified feature set comprising sentiment score and closing price. This combined feature set is then supplied to the stock prediction model, which outputs a forecast of stock price movement for the following trading day. Dashed horizontal lines demarcate the three functional stages of the pipeline: text-based sentiment inference, feature integration, and price-movement forecasting.

After vectorization, we trained two supervised classifiers — Naive Bayes and Support Vector Machine — selected because both have demonstrated reasonable performance on short-text sentiment classification tasks in prior financial NLP work (Pavitha et al., 2022), while remaining computationally lightweight relative to transformer-based alternatives. Figure 3 presents the overall architecture of the sentiment classification component.

The Naive Bayes classifier rests on Bayes' theorem, which estimates the probability of an event conditional on prior knowledge and a (admittedly strong, and admittedly often violated in practice) assumption of feature independence:

P(A|B) = [P(A) × P(B|A)] / P(B)

where P(A) denotes the prior probability of class A, P(A|B) denotes the posterior probability of class A given evidence B, and P(B|A) denotes the likelihood of observing evidence B given class A.

For the specific text-classification context here, we implemented the Multinomial Naive Bayes variant, which is the form most commonly applied in natural language processing tasks. This variant operates on term frequencies — counts of how often a given word occurs within a document — which are first normalized by document length and then used to compute maximum-likelihood estimates of the conditional probabilities from the training data (Su et al., 2011).

2.9 Support Vector Machine (SVM) Classifier

We also implemented a Support Vector Machine classifier, one of the most widely adopted supervised algorithms for textual polarity detection. SVM is capable of both classification (predicting a discrete label) and regression (predicting a continuous value) tasks, and for sentiment classification, the algorithm seeks the hyperplane that optimally separates classes when the data are projected into an n-dimensional feature space. Where the data are not linearly separable in their original space, kernel functions — linear, sigmoid, radial basis function (RBF), polynomial, and other non-linear variants — can be used to transform the feature space to permit separation.

For this study, we used a linear kernel, on the grounds that text classification tasks typically involve a very high-dimensional feature space (each distinct word or token effectively constitutes its own feature), and linearly separable structure tends to emerge naturally at that dimensionality. Practically speaking, the linear kernel also trains substantially faster than non-linear alternatives, and tends to perform well whenever a reasonably clear margin separates the classes — both considerations that were relevant given the dataset size involved.

Stock Price Movement Prediction Model

2.10 Rationale for Model Selection

For the price-forecasting component, we selected a Long Short-Term Memory (LSTM) recurrent neural network architecture. LSTM networks have become a common choice in stock market prediction tasks because of their explicit design for capturing long-range temporal dependencies in sequential data (Li, Wu, & Wang, 2020) — a property that is directly relevant here, since closing-price data is itself a time series, and short-term memory architectures (or simple feedforward models) tend to lose access to longer-horizon patterns that may be predictive of future movement.

2.11 Data Partitioning

The combined dataset — closing price plus daily aggregated sentiment features — was partitioned chronologically into a training set comprising the first 80% of observations and a held-out test set comprising the remaining 20%. We deliberately avoided a random train/test split here, since shuffling time-series data would allow the model to be evaluated on dates that precede some of its training examples, producing an artificially optimistic and practically meaningless estimate of forecasting performance. The chronological split instead ensures that the test set strictly represents future time periods relative to training, which is the only evaluation setup that meaningfully approximates how the model would be used in practice.

2.12 Input Normalization

Prior to model training, all input features were normalized to a common scale, typically the [0, 1] interval. This step matters for two reasons: it prevents features with larger raw numeric ranges (such as closing price, which can run into the hundreds of dollars) from dominating the learning process relative to smaller-scale sentiment confidence scores, and it generally improves the numerical stability and convergence behavior of gradient-based optimization during neural network training.

Because LSTM networks operate on sequences rather than single time points, we constructed fixed-length input

Fig. 2. Class Distribution of the Sentiment-Labeled News Headline Corpus. Bar chart showing the number of headlines per sentiment class in the labeled training corpus (India Financial News Headlines Sentiments dataset; n = 200,501) used to train the sentiment classification models. The negative class comprised 108,118 headlines and the positive class comprised 92,383 headlines, reflecting a modest class imbalance that was not corrected with additional resampling prior to model training.

Fig. 3. Architecture of the Financial News Sentiment Analysis Model. Flow diagram of the sentiment classification pipeline, from raw input to final sentiment output. Raw news headlines (DataSet [News]) are first processed through stop-word filtering and noise removal, after which four candidate text-vectorization approaches 

Fig. 4. Relationship Between Daily News Sentiment and Stock Closing Price. Stacked bar chart comparing daily closing price (light blue) and net news sentiment polarity (dark blue) for the target stock across the study period (December 2006–February 2007).

Fig. 5. LSTM-Predicted Versus Actual Closing Price Using Closing Price Alone. Line plot comparing actual adjusted closing price (Real price, blue) against price predicted by the Long Short-Term Memory (LSTM) model (Predicted price, orange) on the held-out test set (November 9–December 1, 2016), 

Fig. 6. LSTM-Predicted Versus Actual Closing Price Using Closing Price and News Sentiment. Line plot comparing actual adjusted closing price (Real price, blue) against price predicted by the Long Short-Term Memory (LSTM) model (Predicted price, orange) on the same held-out test set (November 9–December 1, 2016) 

windows from the normalized time series, where each window comprises a contiguous span of historical daily observations and the associated target is the subsequent day's stock movement. We set the window length to 14 days. This choice reflects a practical compromise: a window too short risks omitting genuinely predictive longer-term patterns, while a window too long increases the parameter burden of the model relative to the size of the training data — and 14 trading days (roughly three calendar weeks) struck a reasonable balance given our dataset size, though we note this parameter was not exhaustively tuned and represents an area for refinement in future work.

2.13 Model Architecture and Training Procedure

The LSTM network was trained on the windowed training data, learning to map each 14-day input sequence to its corresponding target value. Model parameters (weights and biases) were updated iteratively via gradient-based optimization to minimize prediction error between the model's output and the true target. Specifically, we used mean squared error as the loss function and the Adam optimizer to perform parameter updates — a pairing that is fairly standard for regression-style sequence prediction tasks of this kind, owing to Adam's adaptive learning rate behavior, which tends to produce more stable convergence than plain stochastic gradient descent, particularly when feature scales or gradient magnitudes vary across training.

Two variants of this model were trained and compared: one using only the normalized closing price as input (Figure 5), and a second incorporating both closing price and the daily-aggregated sentiment features described above (Figure 6). This comparison was the central manipulation of the study, allowing us to isolate whatever incremental contribution sentiment information makes over and above price history alone.

2.14 Model Evaluation

Once trained, each LSTM variant was applied to the held-out test set to generate predictions on previously unseen data, and predicted closing prices were compared against actual observed prices across the test period. We additionally examined the correlation between daily news sentiment polarity and same-day closing price movement (Figure 4) as a complementary, model-independent check on whether the underlying sentiment-price relationship the system depends on was actually present in the data, rather than relying solely on downstream forecasting accuracy to make that case indirectly. Because the sentiment classifier's own accuracy directly bounds how much useful signal can flow into the price model, we also report classifier-level performance (Naive Bayes and SVM accuracy) separately from the downstream LSTM forecasting results, so that errors attributable to language classification versus time-series modeling can be distinguished rather than conflated into a single end-to-end metric.

 

3. Results

3.1 Overview

Before getting into the numbers, it's worth stating plainly what we were trying to find out: does financial news sentiment actually carry information that helps predict stock price movement, beyond what historical price alone can offer? The results below are organized to answer that question in stages — first by examining whether sentiment and price even move together in the data (a sanity check, in a sense, before trusting any model output), then by evaluating how well the sentiment classifier itself performed, and finally by comparing the two LSTM-based forecasting variants against one another.

3.2 Relationship Between News Sentiment and Closing Price

We began with a fairly simple, almost old-fashioned step: just looking at how sentiment and price moved together over time, without yet involving any predictive model. Correlational analysis of this kind, however unglamorous it might sound next to a neural network, tends to be a useful gut-check before investing further effort in more complex architectures, and that is largely how we used it here. Figure 4 plots the relationship between daily news sentiment and the closing price of the target stock. What emerged was a pattern that, while not perfectly consistent across every single day, held up often enough to be notable — on most days where news sentiment skewed positive, the share price also tended to climb, pointing to a positive association between the sentiment signal extracted from headlines and subsequent (or concurrent) movement in price [Figure 4]. We want to be careful with the word "causal" here, since this was a correlational observation rather than a controlled experiment, but the direction of the relationship is at least consistent with what earlier work on this topic has reported (Li et al., 2014; Li, Wu, & Wang, 2020).

3.3 Performance of the Sentiment Classification Model

Naturally, the next question was how reliable the sentiment labels feeding into the price model actually were — because if the classifier itself is shaky, everything downstream inherits that uncertainty. The sentiment analysis model, trained using the procedures described in the Methods section, reached an accuracy of approximately 75%. That figure is, frankly, a mixed result. It's high enough to suggest the model is picking up genuine signal rather than guessing at random, but not so high that we'd call it a finished product — and given that sentiment classification accuracy effectively sets a ceiling on how well the downstream price model can perform, this is not a trivial shortfall. Prior studies in this space have generally found transformer-based models such as FinBERT to outperform more traditional classifiers like Naive Bayes (Syeda, 2022), so partly motivated by that gap in the literature, we also experimented with FinBERT in an attempt to push sentiment classification accuracy higher. Density characteristics of the underlying training data used for this stage are shown in [Figure 2], and the overall architecture of the sentiment pipeline is summarized in [Figure 3].

3.4 Stock Price Prediction Using Closing Price Alone

To isolate what sentiment was actually contributing — rather than just assuming it helped — we first trained an LSTM model using only the historical closing price as input, with a window size of 14 days, exactly as described earlier. Figure 5 shows how this model's predicted prices compared against the real, observed prices across the test period [Figure 5]. The fit was, to put it plainly, underwhelming. The predicted trajectory captured some of the broad movement but consistently lagged behind or smoothed over sharper price swings, and the resulting accuracy was noticeably lower than what we obtained once sentiment was added back into the picture (described next). This isn't entirely surprising in hindsight — closing price alone is, after all, only one slice of the information that actually moves markets day to day, and it says nothing about the news, sentiment, or external events that often precede a price shift rather than follow it.

3.5 Stock Price Prediction Combining Closing Price and News Sentiment

We then trained a second LSTM variant, identical in architecture and training procedure to the first, except that it incorporated the daily-aggregated sentiment features (compound polarity, and the positive, negative, and neutral confidence scores) alongside closing price. The results, shown in [Figure 6], told a noticeably more encouraging story. This combined model achieved higher predictive accuracy than the price-only version, and for a number of dates in the test window, the predicted price tracked the actual price quite closely. That said, we'd be overstating things if we called this a clean win across the board — there were still stretches where the model's predictions diverged from reality, sometimes by a fair margin, and we don't want to gloss over that in the interest of telling a tidier story. Still, taken as a whole, the comparison between Figures 5 and 6 offers reasonably direct evidence that financial news sentiment adds predictive value on top of price history alone, which aligns with what has been reported in related work using different markets and modeling approaches (Li, Wu, & Wang, 2020; Mohan et al., 2019; Shynkevich et al., 2015).

3.6 Model Stability and Training Behavior

One issue that came up repeatedly during training, and that we think is worth reporting honestly rather than smoothing over, is that the LSTM architecture had a tendency to overfit fairly easily. Early training runs produced models that performed well on training data but generalized poorly to the held-out test set — a familiar problem with recurrent architectures trained on comparatively limited financial time series. After several rounds of retuning, we arrived at a more stable configuration, one that produced predictions appreciably closer to actual prices than our initial attempts, though we'd stop short of calling the current model fully optimized. There is, in other words, still room between where the model stands now and where a deployment-ready forecasting tool would need to be.

3.7 Summary of Findings

Pulling the threads together: the data show a discernible positive relationship between financial news sentiment and stock closing price movement [Figure 4]; the sentiment classifier performs at a moderate, though not yet strong, level of accuracy (around 75%); and incorporating sentiment into the LSTM forecasting model meaningfully improves prediction quality relative to using closing price alone, even if neither model variant has yet reached a level of accuracy we'd consider satisfactory for practical trading decisions [Figures 5 and 6]. Taken together, these results offer reasonably consistent support for our original premise — that news sentiment carries useful predictive signal for stock movement — while also making clear that there is meaningful work left before that signal translates into reliably accurate forecasts.

4. Discussion

4.1 Interpreting the Core Finding

The starting premise of this work was fairly simple to state, even if not simple to test: does the emotional tone of financial news carry information that helps anticipate where a stock price is headed, or is that relationship mostly noise dressed up as signal? Based on what we observed, the answer leans, cautiously, toward the former. The positive association between daily news sentiment and closing price movement [Figure 4], paired with the improvement seen when sentiment features were folded into the LSTM forecasting model [Figures 5 and 6], points in a consistent direction — sentiment is not just along for the ride, it appears to be doing some genuine predictive work. That said, we want to resist the temptation to oversell this. The improvement was real, but it wasn't dramatic, and the model still missed the mark on a fair number of test dates. So perhaps the more honest framing is this: sentiment helps, but it isn't a silver bullet, and anyone expecting near-perfect forecasts from this approach alone would likely be disappointed.

4.2 Situating the Results Within Prior Literature

It's worth pausing here to ask how these findings sit alongside what others have already reported, because stock prediction using news sentiment is hardly an untouched research area. Mohan et al. (2019), working with five years of price data for S&P 500 companies alongside news articles gathered from international outlets, found that recurrent neural network approaches outperformed both ARIMA and Facebook Prophet, and—much like what we observed—noted a correlation between textual sentiment and price direction. Interestingly, they also flagged a weakness that echoes our own experience: their models struggled more when prices were low or highly volatile, which suggests that the sentiment-price relationship may not hold uniformly across all market conditions, a nuance our own dataset, being a single company over a fixed period, wasn't really positioned to probe in depth.

Li, Wu, and Wang (2020) reported something a bit more decisive in their work on the Hong Kong Exchange: LSTM outperformed both multiple kernel learning and SVM approaches, and incorporating sentiment improved accuracy beyond what price data alone could achieve. That pattern lines up closely with what we found here, which is reassuring in one sense — it suggests our result isn't a one-off artifact of this particular dataset — though it also raises a slightly uncomfortable question about why our overall accuracy still falls short of what would be needed for practical deployment, despite using a broadly similar architecture. Part of the answer probably lies in scale; their dataset spanned several years across an established exchange, while ours was comparatively modest in both duration and company coverage.

Li, Xie, Chen, Wang, and Deng (2014) add a useful caveat that we think is directly relevant to our own results: they found that sentiment-aware models did outperform simple bag-of-words approaches at the stock, sector, and index levels, but they were careful to note that relying purely on positive-versus-negative polarity is not enough to achieve consistently strong predictions. Our findings seem to echo that caution rather well. The 75% accuracy ceiling on our sentiment classifier (described in the Results) likely constrains how much further the downstream price model can improve, no matter how the LSTM architecture itself is tuned—garbage in, garbage out, as the saying goes, although "garbage" feels like an unfairly harsh word for a 75%-accurate classifier. Shynkevich, McGinnity, Coleman, and Belatreche (2015) took a related but distinct angle, showing that combining multiple categories of news—divided by sector and industry—improved forecasting performance relative to using a narrower slice of articles. We did not explore that kind of categorical breadth here, and it's a reasonable possibility that doing so might have given our model more to work with than sentiment polarity alone.

4.3 Why the Sentiment Classifier's Accuracy Matters So Much

It's tempting to treat the sentiment model and the price-forecasting model as two separate stories, but really, they're tightly coupled, and that coupling is probably the single biggest factor shaping our overall results. Existing work comparing classifier architectures has found that FinBERT tends to outperform both Naive Bayes and BERT-based classifiers on financial sentiment tasks (Syeda, 2022), which is part of why we turned to FinBERT ourselves once Naive Bayes and SVM plateaued around 75% accuracy. Pavitha et al. (2022) similarly found that Naive Bayes and SVM deliver reasonably solid results in sentiment classification more broadly—not state-of-the-art, perhaps, but serviceable—and our own results are roughly consistent with that characterization: good enough to be useful, not good enough to be the final word. Whatever error the sentiment classifier carries forward gets baked into the daily sentiment-aggregation features described in the Methods, and from there it propagates directly into the LSTM's input space. So when the price model underperforms on certain dates, it's genuinely difficult to say with confidence whether that's an LSTM limitation, a sentiment-labeling error, or some combination of the two—an ambiguity we think is worth naming plainly rather than glossing over.

4.4 On Overfitting and Model Stability

One thing we didn't expect going in, though perhaps we should have, was how readily the LSTM architecture slipped into an overfit state during early training attempts. This is a fairly well-documented tendency of recurrent networks when trained on time series that are comparatively short or noisy, and our dataset—seven years of single-company price data plus a modest news corpus—falls roughly into that category. After repeated retuning, we did reach a more stable configuration, one that produced predictions noticeably closer to actual values, but we don't think it would be honest to describe the model as fully optimized at this point. There's a meaningful gap between "stable enough to report" and "ready for practical trading use," and we'd place our current model closer to the former.

4.5 Limitations

A few limitations deserve to be stated plainly, partly because we think transparency here matters more than presenting an overly polished narrative. First, the sentiment classifier's accuracy—hovering around 75%—places a fairly hard ceiling on how much benefit the downstream price model can realistically extract from sentiment features, regardless of how the LSTM itself is configured. Second, our stock-price dataset, while spanning seven years, was drawn from a single company (Apple), which limits how confidently these findings can be generalized to other firms, sectors, or market conditions—particularly given that Mohan et al. (2019) found model performance itself varies with volatility and price level, a factor we were not able to examine directly with a single-company dataset. Third, the target-company news corpus, at just over 500 items, is relatively small next to the scale of data used in some comparable studies (Li, Wu, & Wang, 2020; Shynkevich et al., 2015), and a sparser news stream likely makes daily sentiment aggregation noisier than it would be with denser coverage. Finally, the LSTM's tendency toward overfitting, even after tuning, suggests that the current model configuration may not yet be capturing the underlying price dynamics as robustly as we'd like, and some of the prediction error reported in the Results may reflect that instability rather than a genuine ceiling on what sentiment-augmented forecasting can achieve.

4.6 Implications and Future Directions

Taken together, what we think this study contributes—modestly, but we'd argue meaningfully—is further evidence that financial news sentiment is not predictively inert; it adds something over and above raw price history, even within a fairly constrained, single-company setup. Where we go from here seems reasonably clear. Improving the sentiment classifier itself, whether through FinBERT or another transformer-based approach, seems like the most direct way to lift the ceiling described above, since the price model's performance is so closely tethered to the quality of its sentiment inputs. Expanding the underlying dataset—more companies, longer time horizons, and a denser news corpus, perhaps incorporating multiple categories of articles in the spirit of Shynkevich et al. (2015) rather than treating all news as a single undifferentiated stream—would likely help address both the overfitting issue and the generalizability concerns raised above. There may also be value in incorporating additional features beyond price and sentiment alone, given that stock movement is shaped by a tangle of factors—macroeconomic indicators, trading volume, broader market indices—that this study did not attempt to capture. None of this is to say the current results are unimportant; rather, we'd frame this work as one reasonably solid data point in a larger, ongoing effort to figure out just how much of stock market behavior language can actually explain.

5. Conclusion

This study set out to test a fairly simple idea — that the emotional tone of financial news carries forecasting value beyond what historical prices alone can offer — and the evidence, on balance, supports it. Sentiment classification reached moderate accuracy, and incorporating that sentiment into an LSTM forecasting model improved prediction quality over a closing-price-only baseline, even if neither model has yet reached deployment-grade reliability. The relationship was real but imperfect, with errors likely originating from both sentiment misclassification and model instability. Future work should prioritize stronger sentiment models, broader company and news coverage, and additional market features to narrow this remaining gap.

References


Akter, S., & Aziz, M. T. (2016). Sentiment analysis on Facebook group using lexicon-based approach. In 2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT) (pp. 1–4). IEEE. https://doi.org/10.1109/ICEEICT.2016.7873080

Arif, M. H., Li, J., Iqbal, M., & Liu, K. (2018). Sentiment analysis and spam detection in short informal text using learning classifier systems. Soft Computing, 22(21), 7281–7291. https://doi.org/10.1007/s00500-017-2729-x

Batra, R., & Daudpota, S. M. (2018). Integrating StockTwits with sentiment analysis for better prediction of stock price movement. In 2018 International Conference on Computing, Mathematics and Engineering Technologies (ICoMET) (pp. 1–5). IEEE. https://doi.org/10.1109/ICOMET.2018.8346382

Biswas, S., Ghosh, A., Chakraborty, S., Roy, S., & Bose, R. (2020). Scope of sentiment analysis on news articles regarding stock market and GDP in struggling economic condition. International Journal of Emerging Trends in Engineering Research, 8(7), 3594–3609. https://doi.org/10.30534/ijeter/2020/117872020

Bonta, V., Kumaresh, N., & Janardhan, N. (2019). A comprehensive study on lexicon-based approaches for sentiment analysis. Asian Journal of Computer Science and Technology, 8(S2), 1–6.

Budiharto, W. (2021). Data science approach to stock prices forecasting in Indonesia during COVID-19 using Long Short-Term Memory (LSTM). Journal of Big Data, 8(1), Article 47. https://doi.org/10.1186/s40537-021-00430-0

Chaudhuri, A., Mukherjee, S., Chowdhury, S., & Sadhukhan, B. (2018). Fractality and stationarity analysis on stock market. In 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN) (pp. 395–398). IEEE. https://doi.org/10.1109/ICACCCN.2018.8748504

Chauhan, P., Sharma, N., & Sikka, G. (2021). The emergence of social media data and sentiment analysis in election prediction. Journal of Ambient Intelligence and Humanized Computing, 12(2), 2601–2627. https://doi.org/10.1007/s12652-020-02423-y

Chou, C., Park, J., & Chou, E. (2021). Predicting stock closing price after COVID-19 based on sentiment analysis and LSTM. In 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) (Vol. 5, pp. 2752–2756). IEEE. https://doi.org/10.1109/IAEAC50856.2021.9390845

Derakhshan, A., & Beigy, H. (2019). Sentiment analysis on stock social media for stock price movement prediction. Engineering Applications of Artificial Intelligence, 85, 569–578. https://doi.org/10.1016/j.engappai.2019.07.002

Dutta, A., Pooja, G., Jain, N., Panda, R. R., & Nagwani, N. K. (2021). A hybrid deep learning approach for stock price prediction. In A. Joshi, M. Khosravy, & N. Gupta (Eds.), Machine learning for predictive analysis (pp. 1–10). Springer. https://doi.org/10.1007/978-981-15-7106-0

Eck, M., Germani, J., Sharma, N., Seitz, J., & Ramdasi, P. P. (2021). Prediction of stock market performance based on financial news articles and their classification. In N. Sharma, A. Chakrabarti, V. E. Balas, & J. Martinovic (Eds.), Data management, analytics and innovation (pp. 35–44). Springer.

Elena, P. (2021). Predicting the movement direction of OMXS30 stock index using XGBoost and sentiment analysis [Bachelor's thesis, Blekinge Institute of Technology]. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21119

Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Media.

Li, X., Wu, P., & Wang, W. (2020). Incorporating stock prices and news sentiments for stock market prediction: A case of Hong Kong. Information Processing & Management, 57(5), 102212. https://doi.org/10.1016/j.ipm.2020.102212

Li, X., Xie, H., Chen, L., Wang, J., & Deng, X. (2014). News impact on stock price return via sentiment analysis. Knowledge-Based Systems, 69, 14–23. https://doi.org/10.1016/j.knosys.2014.04.022

Mohan, S., Mullapudi, S., Sammeta, S., Vijayvergia, P., & Anastasiu, D. C. (2019). Stock price prediction using news sentiment analysis. In 2019 IEEE Fifth International Conference on Big Data Computing Service and Applications (BigDataService) (pp. 205–208). IEEE. https://doi.org/10.1109/BigDataService.2019.00035

Pavitha, N., Pungliya, V., Raut, A., Bhonsle, R., Purohit, A., Patel, A., & Shashidhar, R. (2022). Movie recommendation and sentiment analysis using machine learning. Global Transitions Proceedings, 3(1), 279–284.

Shynkevich, Y., McGinnity, T. M., Coleman, S., & Belatreche, A. (2015). Predicting stock price movements based on different categories of news articles. In 2015 IEEE Symposium Series on Computational Intelligence (pp. 703–710). IEEE. https://doi.org/10.1109/SSCI.2015.107

Su, J., Shirab, J. S., & Matwin, S. (2011). Large scale text classification using semisupervised multinomial Naive Bayes. In Proceedings of the 28th International Conference on Machine Learning (ICML 2011).

Syeda, F. S. (2022). Sentiment analysis of financial news with supervised learning [Manuscript].

 


Article metrics
View details
0
Downloads
0
Citations
4
Views

View Dimensions


View Plumx


View Altmetric



0
Save
0
Citation
4
View
0
Share