ChatGPT: Copilot Today, Autopilot Tomorrow?

For more on artificial intelligence (AI) applications in investment management, read The Handbook of Artificial Intelligence and Big Data Applications in Investments, by Larry Cao, CFA, from the CFA Institute Research Foundation.

ChatGPT and other large language models (LLMs) may someday automate many investment management and finance industry tasks. While that day is not here yet, LLMs are still useful additions to the analyst’s toolkit.

So, based on what we have learned about the new, dark art of prompt engineering, how can quant and fundamental analysts apply LLMs like ChatGPT? How effective a copilot can these technologies be?

Fundamental Analyst Copilot

Stock analysts generally know their companies from top to bottom, so ChatGPT may not reveal anything altogether new about their primary names. But LLMs can generate overviews of less well-known firms quickly and at scale.

Here are the ChatGPT prompts we’d deploy to analyze a hypothetical CompanyX.

Company Overview

“explain the business model of CompanyX”
“conduct SWOT analysis of CompanyX” (strengths, weaknesses, opportunities, threats)
“list 10 competitors of CompanyX”
“list the 10 main risks to an investment in CompanyX”

Environmental, Social, and Governance (ESG) Overview

“list and describe 10 key Environmental scandals of CompanyX”
“list and describe 10 key Governance scandals of CompanyX”
“list and describe 10 key Social scandals of CompanyX”
Drill down as appropriate

We’d also add a standard ending to each prompt to increase the chances of an accurate response: “list your sources; if you do not know an answer, write ‘Do not know.’”

Case Studies

Now we can test some of these prompts in two simple case studies:

“summarize: [web address of text document, or paste in the text]”
“list 10 key negatives” (risky unless we provide source text)
Drill down as appropriate

We ran the above ChatGPT analysis on two real-life companies — Mphasis, a lightly covered Indian mid-cap and Vale, a very well-covered Brazilian mining company — and scored the results of each task on a one-to-five scale, with five being the highest. The answers were generated simply by prompting ChatGPT4, but in actual practice, the highest-tech managers would automate much of this process. We would use multiple LLMs, which give us more control over the responses, greater validation and cross-checking, and much greater scale. Of course, like all ChatGPT-produced results, those below need to be treated with care and not taken at face value, especially if we are relying on the model’s training data alone.

1. Mphasis Company Overview

While the results are hardly revelatory, ChatGPT does provide an informative, high-level summary of Mphasis. We also prompt it for sources and explicitly instruct it not to make things up. Such measures improve accuracy but are not foolproof.

As we proceed, the LLM offers up more interesting insights.

Image of ChatGPT Query and Answer: Explain Mphasis Business Model

We can now drill down with a little SWOT analysis.

Image of ChatGPT query: Conduct SWOT Analysis on Mphasis, List Sources

Our SWOT analysis identifies “Dependencies on Certain Industries” as a potential weakness for the company. So, we pose additional questions to help understand the underlying context.

Image of Chat GPT Query: Explain More about Mphasis Weakness Dependence on Certain Industries

Mphasis Company Overview Score: 4

2. Vale ESG Overview

Vale’s record on ESG issues has generated headlines and ChatGPT picks up on the major themes. A simple prompt for a specific aspect — “Social” — yields accurate results, even though the system cautions that it cannot attribute sources and recommends we cross-reference the response. To get into more detail, we need to delve deeper than ChatGPT allows.

Image of ChatGPT Query: List and Describe 10 Social Scandals of the Company Vale

Vale ESG Overview Score: 3

Ground Truthing: ChatGPT Interrogates and Summarizes

Latest Mphasis Data Summary

ChatGPT can summarize and interrogate a company’s latest earnings call, news flow, third-party analysis, or whatever data we provide — this information is called the “ground truth,” which is a different use of the expression than in supervised machine learning. But if we don’t specify and deliver the text for ChatGPT to analyze, as we saw above, it will rely only on its training data and that increases the risk of misleading “hallucinations.” Moreover, the end-date of the LLM’s training data will limit the possible insights.

Another point to keep in mind: Official company communications tend to be upbeat and positive. So rather than ask ChatGPT to “summarize” an earnings call, we might request that it “list 10 negatives,” which should yield more revealing answers. ChatGPT delivers fast and effective results. Though they are often obvious, they may reveal important weaknesses that we can probe further.

ChatGPT analyis of Mphasis ten negatives

Latest Mphasis Data Summary Score: 5

Quant Analyst Copilot

ChatGPT can write simple functions and describe how to produce particular types of code. In fact, “GPT codex,” a GPT3 component trained on computer programming code, is already a helpful auto-complete coding tool in GitHub Copilot, and GPT4 will be the basis of the forthcoming and more comprehensive GitHub Copilot X. Nevertheless, unless the function is fairly standard, ChatGPT-generated code nearly always requires tweaks and changes for correct and optimized results and thus serves best as a template. So at the moment, LLM autopilots appear unlikely to replace quant coders anytime soon.

A quant might use ChatGPT for the three tasks described below. Here we are simply prompting ChatGPT. In practice, we would access specific codex LLMs and integrate other tools to create far more reliable code automatically.

1. Develop an Entire Investment Pipeline

ChatGPT can partly execute complex instructions, such as “write python functions to drive quant equity investment strategy.” But again, the resulting code may need considerable editing and finessing. The challenge is getting ChatGPT to deliver code that is as close as possible to the finished article. To do that, it helps to deploy a numbered list of instructions with each list item containing important details.

In the example below, we prompt ChatGPT to create five functions as part of a factor-based equities investment strategy and score each function on our five-point scale. For slightly higher accuracy, we would also construct a prompt for the system to “ensure packages exist, ensure all code parses.”

1. Download Factor Time-Series Data

ChatGPT generates a decent function that downloads a zip file of factor data from the Kenneth R. French Data Library and extracts a CSV file. But we had to add nuanced instructions — “download zip file, unzip, read csv into Pandas DataFrame” — for ChatGPT to perform well.

Score: 4

2. Download Equity Returns Data

Again, the function ChatGPT writes does work. But again, we had to add more details, such as “using get_data_yahoo, read csv into Pandas DataFrame,” to make the function work properly.

Score: 4

3. Align the Dates in Our Downloaded Data

The data we downloaded, from the Kenneth R. French Data Library and Yahoo, have different date formats and frequencies. ChatGPT did not sort this issue for us, so we had to reformat dates and then write the code to align the two sets of data. This data wrangling is the most time-consuming and risky aspect of most data processes, and ChatGPT was of little help.

Score: 0

4. Use a Simple Factor Model to Forecast Returns

With ChatGPT, we can calculate stock-level factor loadings, but the expected returns are based on the factor returns we used to fit the model. This is not helpful. So, we have to investigate and understand where ChatGPT went awry and manually fix it.

Score: 2

5. Construct Portfolios and Run Simulations

The final simulation function misfires. It fails to generate expected returns for all of our stocks over all time periods in our data and isn’t an effective guide for portfolio construction decisions. It just calculates one expected return value for each stock.

We must intervene to loop through each time period and engineer the function to do what we want it to. A better prompt makes for better results.

Score: 1

Image of Chat GPT Query: Write Python Functions to Drive Quant Equity Investment Strategy

Develop an Entire Investment Pipeline Score: 1

2. Create a Machine-Learning, Alpha-Forecasting Function

Follow-up requests give us a simple machine-learning function, or template, to forecast stock returns. ChatGPT does a reasonable job here. It provides a function that we can then adjust and offers advice on how to apply it, recommending cross-validation for a random forest.

ChatGPT Python Code for Random Forest Learner

Create a Machine-Learning, Alpha-Forecasting Function Score: 4

3. Create a Useful Function: Target Shuffling

We next ask ChatGPT to write a helpful and moderately complex function to conduct target shuffling. Target shuffling is a method to help verify an investment model’s outcomes. A simple request to “write Python code for a target shuffling function” does not give us much. Again, we had to input a detailed list outlining what we want for ChatGPT to produce a reasonable template.

Image of ChatGPT Query: Write Python Function to Run Target Shuffling on an Index of Equities

Create a Useful Function: Target Shuffling Score: 5

Graphic for Handbook of AI and Big data Applications in Investments

Copilot Performance

As an adjunct to a fundamental analyst, ChatGPT functions reasonably well. Though detail is sometimes lacking on less-well-covered companies, the stock summaries demonstrate ChatGPT’s speed and precision as an aggregator — when queries require no reasoning, subjectivity, or calculation. For ESG applications, ChatGPT has great potential, but once we identified a controversy, we could only drill down so far as the system only had so much data.

ChatGPT excels at quickly and precisely summarizing earnings transcripts and other long-form text about companies, sectors, and products, which should free up time for human analysts to dedicate to other tasks.

While ChatGPT seems to disappoint as a quant copilot, it does add some value. To produce complex pipelines, ChatGPT needs precise prompts that require considerable time and intervention to construct. But with more specific functions, ChatGPT is more reliable and can save time. So overall, ChatGPT’s effectiveness as a copilot is largely a function of how well we engineer the prompts.

However, if we step things up and build an application on top of GPT4, with refined prompts, cross-validated results, and structured outputs, we could significantly improve our results across the board.

Professional Standards, Regulation, and LLMs

What sort of implications do LLMs have for professional standards and regulation? In “Artificial Intelligence and Its Potential Impact on the CFA Institute Code of Ethics and Standards of Professional Conduct,” CFA Institute raised important questions about their investment management applications and there are obvious questions about appropriate risk management, interpretability, auditability, and accountability around LLMs.

This is why the direct and uncontrolled application of ChatGPT responses to investment decision making is currently a nonstarter. But the technology is moving fast. Alphabet, for example, is working to provide sources for LLM responses. Further developments in so-called machine reasoning and causal machine learning may widen LLMs’ applications still further. Nevertheless, current, raw LLM technology cannot satisfy the duty of care obligations intrinsic to investment management. Which is why — absent access to the most sophisticated resources that can implement cross-validated and checked LLM responses — we advise against anything but the most peripheral use of LLMs.

LLMs: Future Applications in Investment Management

If analysis and investment indeed compose a mosaic, LLMs provide managers who understand the technology with a powerful tile. The examples above are simply ChatGPT prompts, but developers and managers with class-leading technology are already working to apply LLMs to investment management workflows.

In investment management, LLMs may already be at work on the following tasks:

Sense Checking

Portfolio managers could sense check investments with LLMs at a portfolio or even asset allocation level based on such criteria as ESG scandals or investment risks. This could ultimately be extended to institutional investing and robo-advisers.

Analyst Copilot

LLMs can help fundamental analysts quickly acquire basic knowledge about many companies at once. And quant analysts can use them to develop and debug code. Of course, there are risks and drawbacks that need to be carefully managed. The ChatGPT prompts we use above show one way to do this manually, but apps that write prompts automatically are likely to be available soon and should help achieve more detailed and specific objectives. Indeed, we expect a new tech arms race to develop.

Ad tile for Artificial Intelligence in Asset Management

Analyst Automation

Ultimately higher-tech systematic managers will harness LLMs to automate the research that fundamental analysts would otherwise conduct. But they will use this output as another input to their stock selection and investment models. For this to work, LLMs’ flaws, particularly those related to timeliness and logical or causal reasoning, will have to be addressed.

But even in their current form, well-integrated LLMs can create significant efficiencies if applied in the right way. And they hint at the technology’s vast potential.

In its next generation, LLM technology will become an indispensable investment management tool. By automating information gathering and other tasks, human analysts will have more time and bandwidth to focus on the reasoning and judgment side of the investment process. This is only the beginning.

For further reading on this topic, check out The Handbook of Artificial Intelligence and Big Data Applications in Investments, by Larry Cao, CFA, from the CFA Institute Research Foundation.

If you liked this post, don’t forget to subscribe to the Enterprising Investor.

All posts are the opinion of the author(s). As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.

Professional Learning for CFA Institute Members

CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.

Dan Philps, PhD, CFA

Dan Philps, PhD, CFA, is head of Rothko Investment Strategies and is an artificial intelligence (AI) researcher. He has 20 years of quantitative investment experience. Prior to Rothko, he was a senior portfolio manager at Mondrian Investment Partners. Before 1998, Philps worked at a number of investment banks, specializing in the design and development of trading and risk models. He has a PhD in artificial intelligence and computer science from City, University of London, a BSc (Hons) from King’s College London, is a CFA charterholder, a member of CFA Society of the UK, and is an honorary research fellow at the University of Warwick.

Tillman Weyde, PhD

Tillman Weyde is a reader in the Department of Computer Science at City, University of London and is a veteran artificial intelligence (AI) researcher. He is the head of the Machine Intelligence and the Media Informatics Research Groups at City, and was previously a senior lecturer in AI and computer science. Weyde has worked in the field of AI for more than 25 years and is an award-winning AI researcher, with more than 150 major publications. He holds degrees in mathematics, computer science, and music from the University of Osnabrück and gained his PhD in 2002.

Source link