VIEWPOINT

Vol. 138 No. 1627 |

DOI: 10.26635/6965.6979

An approach to make general practitioner referrals suitable for artificial intelligence deployment

This viewpoint examines the potential for decision support, such as AI, to assist with triaging general practitioner (GP) referrals to cardiology outpatients at Health New Zealand – Te Whatu Ora Waitematā. The general principles and approach discussed have the potential to scale and extend more widely to other districts and specialities.

Full article available to subscribers

Support for applying artificial intelligence (AI) to healthcare has recently been expressed at ministerial levels.1,2 In a healthcare sector beset by staff shortages and limited funding, AI is said to have “the potential for very high return on investment”.3 This viewpoint examines the potential for decision support, such as AI, to assist with triaging general practitioner (GP) referrals to cardiology outpatients at Health New Zealand – Te Whatu Ora Waitematā. The general principles and approach discussed have the potential to scale and extend more widely to other districts and specialities.

Keeping pace with the number of outpatient referrals is challenging for hospital specialities. The challenges include the sheer volume of referrals, with their year-on-year increase (Figure 1), as well as the need for timeliness in investigating and treating.4 Some cardiology conditions carry a mortality risk, making prompt assessment important for more than quality of life reasons alone. AI and conventional decision support techniques have the potential to assist, and we examine how current processes could be adapted for such deployment.

Many decision support tools could be applied to this problem. We have chosen two techniques that are particularly illustrative for clinicians seeking to understand the available options. There are many other tools available that could be applied to this problem.

The two techniques used for illustration are human-designed decision trees and contemporary machine learning (ML). They are at opposite ends of the complexity spectrum of available methods. We suggest that, by employing two complementary approaches, the power of contemporary, sophisticated ML is harnessed while ensuring clinical safety through a simpler, more transparent technique, particularly during the initial deployment.

This viewpoint first defines relevant terms, then describes the current process for handling outpatient referrals, followed by a discussion of barriers to implementing decision support. We then describe a possible approach to addressing these issues by combining decision support techniques in a stepwise process. The aim is to maintain patient safety at every stage of the development process, yet culminate in maximising the benefit from contemporary sophisticated decision support tools. Our findings and proposals are informed by a survey of GP views (Appendix 2) and an analysis of declined referrals (Appendix 3).

This is an account “from the trenches” designed for the non-expert; a comprehensive position paper is available.3

Definition of terms

Decision support: decision tree versus ML

A decision tree (flow chart) can help GPs provide all relevant information by using structured questions. A decision tree breaks the decision into a series of simple “yes/no” questions, as per the examples in Appendix 4. The transparency of the decision process makes the tree educational for users. By forcing a stepwise assessment, the chart ensures that referrals are graded on a consistent set of criteria, thus reducing the variation that occurs between human triagers.

A ML model would learn from a large dataset of past referrals that have an agreed triaging decision. The model can detect patterns too subtle or complex for a simple decision tree—for instance, combinations of symptoms that, while individually mild, tend to lead to referral acceptance when seen together. Unlike the static decision tree, the ML model continues to learn as more referral and outcome data are fed in, making it adaptive and more accurate than rigid criteria. ML could either replace or augment the decision tree.

AI: ML compared with large language model (LLM)

ML refers broadly to algorithms that learn patterns from data to make predictions or classifications. In contrast, LLMs are statistical systems trained on vast amounts of text and other unstructured data to generate language that resembles human communication. In medicine, ML might be applied to imaging or laboratory data to predict disease or identify abnormalities. In contrast, LLM can summarise patient records, draft clinical correspondence, provide natural language responses to medical queries or interpret text entries.

ML models offer a significant advantage over traditional approaches because they do not require researchers to fully specify the structure of relationships in advance. Instead, they sift through data and detect subtle or unexpected patterns that humans might never have anticipated, making them powerful tools for uncovering new insights. The trade-off, however, is that these models often operate as a “black box”, producing results without making clear why certain connections were drawn. This can lead to spurious or misleading associations being treated as meaningful, including the embedding of existing biases. For this reason, while ML expands the frontier of what can be uncovered, oversight remains essential to interpret results responsibly and to ensure that identified patterns are both accurate and relevant. An ML tool with excellent average performance will still produce a small percentage of incorrect decisions, and these may be consequential for individual patients.5

The problem

Overview of outpatient service and staffing

The current referral process entails a GP making an electronic, largely free-text referral. Once registered by the booking and scheduling clerk, the triaging cardiologist either accepts (and prioritises) the referral or declines it. The clerks then act on these decisions. There are no automated decision aids.

Problems with the current triage service

Processing cardiology outpatient referrals consumes a considerable amount of resources. At Waitematā, nine doctors and one nurse do not keep pace with the triaging of referrals, which increase by 1,374 each year (see Figure 1). The number of clerical staff required for this manual process is not quantifiable, as they are shared across departments. Clinical risk to patients increases with delays at every step, initially with referrals waiting to be triaged, then accepted referrals waiting to be seen and finally seen patients awaiting subsequent investigations (Table 1). Automation with appropriate decision aids could expedite the triage of referrals and thereby reduce wait times and associated clinical risk.

However, the current free-text format delivers varying amounts of information. Triaging cardiologists want referrals to consistently contain specific information relevant to the reason for referral. If automated decision aids are to be useful and trustworthy, this minimum dataset for every patient becomes mandatory.

Another issue with the current human-led service is the (understandable) variation in decisions between individual cardiologists (e.g., see variation in referral decline rates in Appendix 3). This is despite regional recommendations on triaging (Appendix 1). Automation has the potential to reduce variability with its more consistent decisions better reflecting clinical risk. However, training a decision aid is best done with a library of referrals, each with an agreed reference decision.

View Figure 1, Table 1–2.

A solution

Summary

We are concerned that applying black box ML to this dataset may be clinically risky without an intermediate step; for this reason, we suggest using human-designed decision trees. These two decision aids may prove complementary by feeding back to each other. We acknowledge that any decision aid can make mistakes, and the process needs to be structured to minimise the impact on individual patients. Table 2 explains some of the important concepts. Other suitable options, such as “explainable” AI, are not considered here.

Information from GPs that is sufficient and structured

Our illustrative strategy involves a first step of generating, for each patient, a sufficient minimum set of data relevant to the presenting symptom. The draft decision trees in Appendix 4 contain examples of relevant information for some common conditions; they will need to be further developed and improved. Taking chest pain as an example, it is essential to know whether it is exertional and whether it is accompanied by shortness of breath. Such a minimum dataset will help both the current human triagers and any subsequent automation.

This specific information required by cardiologists for each condition could be obtained from GPs using a structured form using tick boxes (still with room for free-text additions). The information would then be digitally incorporated into the hospital information system, relieving clerical staff of the chore of (error-prone) manual entry.

Without the use of structured forms, those patients lacking the minimum dataset of information are at risk of variable and erroneous decisions, whether made by humans or automation. It is not realistic to expect GPs to know what information specialists require. Our GP questionnaire found that only 29% of GPs are confident about the information required for a cardiology referral, and only 50% are confident about which conditions are typically seen by cardiology (Appendix 2).

Only limited information (variables) can be requested from the GP to avoid the process becoming too time-consuming. However, GPs are increasingly using “AI medical scribes”, which may facilitate gathering more information without them being overwhelmed. Nevertheless, there will always be substantial amounts of pertinent information missing from a referral. Some information is currently unavailable because it has not yet been obtained (e.g., through a future Holter monitor or echocardiogram), while other details are missing because it would be unreasonable to expect GPs to provide too much information. Furthermore, individual patients will have uncommon factors that are idiosyncratically predictive for them but are not captured by any manageable process. Only a small fraction of the many relevant variables (dimensions) can be captured. These limitations mean that the training of any automation carries a risk to individual patients that needs to be managed carefully.

Agreed end points for training decision support

To train automated decision aids, a gold-standard decision is needed for reference. Currently, this doesn’t exist, as human triagers (being human) exhibit variability in their decisions (Appendix 3). Such variation needs to be eliminated to supply a gold standard or reference decision for automation training. This could be achieved by a small number of cardiologists reaching consensus on cases to develop and validate decision trees. Human oversight would need to be an ongoing, iterative process, as new cases challenge the algorithm, clinical practice changes and thresholds alter due to changes in resourcing.3 Below, we suggest that this onerous human oversight of ML may possibly be devolved to decision trees.

Going straight to ML may be problematic

ML learning in these sparse representations of high-dimensional data carries a risk of unpredictable outcomes, which is an unpalatable risk in clinical medicine.6 We suggest that success with ML will be more likely if the ground is prepared before deployment. This could be achieved iteratively by first transitioning from free-text electronic referrals to a structured referral form that can support the implementation of decision trees, before considering full-blown black box ML (Appendix 4). The idea of having a transitional state of decision trees inserted between human triage and ML made even more sense when we became aware that there had been a previous failed attempt at deploying AI to triage cardiology referrals at Waitematā.7 We have been unable to obtain further information and so do not know the reason for failure.7

Human-designed decision trees

Decision trees mimic how a cardiologist makes decisions. By making these thought processes explicit, transparent and fully explainable, decision trees will serve as a helpful bridge between human triaging and subsequent ML.

Decision trees function predictably with small numbers of variables, although at the expense of being biased.8 However, bias is easy to detect in decision trees, and their transparent nature allows identification and inclusion of the additional variables needed to mitigate the bias.

The act of cardiologists developing decision trees will facilitate the development of consensus reference decisions for subsequent training of ML.

Over time, decision trees may be able to replace humans in the task of ongoing oversight of ML decisions.

In a feedback loop, ML may identify new predictive variables that can then be incorporated into the structured questionnaire, enabling this variable to be obtained for all patients and thereby improving the predictive power of the ML.

Decision support makes mistakes

Like humans, any automated decision support will make mistakes. It may be that, as with self-driving cars, society will be less tolerant of automation error than of human error.9 Decision trees exemplify the concept of “satisficing”, which is finding a good-enough solution when it is not practical to find the optimal solution.10,11 They may work better than ML when there are many unknown variables (as here).12 In this setting, their output is more predictable than ML and more readily modified.3,12 Simple decision trees mimic the decision making of legal judges surprisingly faithfully and may perform similarly for cardiologists, regardless of how sophisticated we view our own decisions as.12

Decision trees or ML can only ever be “probably approximately correct”5 when evaluated on large numbers of patients. That means that decisions on individual patients have the potential to deviate sufficiently to be a clinical safety issue. At the outset, the error bounds for both the approximation and the probability are unknown, hence the need for oversight. We believe the human-designed decision trees provide the necessary safety, at least for the initial stages. Their deployment is carried out in a stepwise fashion, allowing for iterative refinement with minimal clinical risk and helping to maintain clinician trust.3 Regardless, some misclassification will occur at the conceptual level due to the challenge of crystallising the diversity of human symptoms into binary variables. However, if data collection is digital, misclassification due to data entry errors and data handling will be minimised.13

At the opposite end of the spectrum are the black box forms of ML, where it is not possible to explicate or understand the rationale behind the algorithm’s recommendation, or even which variables were used to predict the recommended outcome.

Sequence of deployment

ML is more powerful than decision trees, but, initially at least, it will be more prone to erratic and deviant results, given that the available information is sparse. ML needs a library of reference decisions before it can be trained.

Structured forms will provide more complete information, which is a necessary condition for any automation. Initial deployment of decision trees would be alongside, but invisible to, the human triagers. The discrepancies between human and algorithmic outcomes will be reviewed, and the algorithm will be refined accordingly. Next, the tree results are made visible to the human triagers for further refinement. Finally, a decision is made on whether some classes of referrals can be accepted or declined solely by the algorithm. There will be an indeterminate group where acceptance or rejection will need human input. Further iterations are performed to minimise the size of this indeterminate group. However, indeterminate presentations should never fall to zero, as there will always be complex and poorly differentiated cases. Forcing these into the algorithm risks misclassification.

Declined referrals should be accompanied by standardised information to the GP on why the referral was rejected, together with suggestions for management. This will assist the 42% of GPs who perceive that a referral has been declined inappropriately, and it functions as a just-in-time education tool for the GP referrer (Appendix 2).14

The next step is to train the ML model on the entire content of the referral, including both structured information and free text. This latter may be extracted by an LLM and fed into the ML algorithm. The LLM output is a probabilistic tool (i.e., it may give a different result each time it is fed the same information) and so adds an additional element of unpredictable variation that needs oversight to ensure the safety of individual patients.

It may be most useful for the decision trees and ML to proceed in parallel. If the ML identifies novel variables, these could be fed back and incorporated into the decision trees. As clinical safety is assured, increasing weight can be given to ML, which should eventually overtake the decision tree in its predictive ability.

Limitations

The current Auckland Region eReferral system is not suited to our proposals. We understand there is a project to review and upgrade the software, which will be more suited to deliver the dashboard views (Appendices 2 and 3).

This report is on the current system for referral triaging. However, this is embedded within the larger New Zealand health IT infrastructure, and the need for compatibility will influence which solutions are most appropriate.

The decision tree concept was developed with support and input from the Waitematā cardiology liaison GP, but has not yet been discussed more widely within the GP community. However, the orthopaedic service has successfully implemented decision trees with tick boxes and drop-down selections. Informal discussions with GPs indicate that they have accommodated the increased time required to complete the forms by scheduling a separate appointment specifically for the orthopaedic referral. They see an advantage in the form, as it provides an immediate answer as to whether the patient qualifies for joint replacement surgery and at what priority. This enables real-time discussion between the GP and the patient about the reasons for acceptance or rejection. GP practice software will need to incorporate the decision trees and AI software, which will take time to implement. It may also be helpful to incorporate the existing GP guidelines (“health pathways”) into this software, providing more ready access to advice.14

Insufficient information is available for a quantitative cost–benefit analysis. There are alternative approaches to this problem, utilising different automation tools. We view the approach outlined here as particularly illustrative for those unfamiliar with decision support tools.

Conclusion

There are many opportunities for AI to assist healthcare. This viewpoint examines the potential for automated decision support, including AI, to assist in triaging GP referrals. It has the potential to improve efficiency, reduce personnel requirements and provide more consistent decisions when compared with human triagers. We review an approach that is illustrative for those unfamiliar with decision support, while acknowledging that other options will be suitable.

The volume of GP referrals is substantial and poses significant challenges. However, two steps are required to prepare the ground for automation. Firstly, adequate information is crucial, especially for black box ML. A minimum dataset is necessary for every patient, which requires switching to a structured referral form instead of free text. Secondly, a library of historical referrals with reference (gold standard) decisions is needed for the training of automation.

We suggest that human-designed decision trees can complement contemporary black box ML by mitigating the risk of erroneous decisions that may affect the safety of individual patients.

View Appendices.

Outpatient referrals for hospital specialist assessment are an increasing workload that carry significant risk if not attended to in a timely manner.
This viewpoint discusses how decision support (including artificial intelligence and machine learning) may address this problem. Of the many possible approaches, we choose a combination of two that illustrate the breadth of available tools and how they combine to complement each other.
To understand the issues and inform this discussion, a survey of general practitioners’ views was conducted (Appendix 2), an audit of declined referrals was undertaken (Appendix 3) and draft decision trees were constructed (Appendix 4).
To have data suitable for automated decision support, the current referral needs to change from free text to a structured format that ensures every patient has a complete minimum dataset. Regarding triaging decisions, at present there is human variability, but the decision support tools will need to be trained on a set of referrals that have an agreed gold-standard decision. In order to maintain patient safety throughout, the process needs to be incremental. We suggest that one way to assure patient safety is to combine simple decision trees with sophisticated contemporary machine learning.

Authors

Evelyn Lesiawan: Advanced Physician Trainee, Health New Zealand – Te Whatu Ora, New Zealand.

Bruce Sutherland: ­­­­­General Practitioner, Kawau Bay Health, Warkworth, Auckland, New Zealand.

Christoph Schumacher: Professor of Economics, School of Economics and Finance, Massey Business School, Albany, Auckland, New Zealand.

Andrew Cave: Digital Hospital Implementation Lead, Data & Digital, Integration & Delivery, Health New Zealand – Te Whatu Ora Northern, New Zealand.

Guy Armstrong: Cardiologist, Health New Zealand – Te Whatu Ora Waitematā, Auckland, New Zealand.

Correspondence

Guy Armstrong: Lakeview Cardiology, North Shore Hospital, Private Bag 93 503, Takapuna, Auckland, New Zealand.

Correspondence email

guy.armstrong@waitematadhb.govt.nz

Competing interests

Nil.

1)       Bhamidipati S. Experts cautiously optimistic about expanding AI in education, health sectors [Internet]. RNZ; 2024 Jul 3[cited 2024 Jul 22]. Available from: https://www.rnz.co.nz/news/national/521217/experts-cautiously-optimistic-about-expanding-ai-in-education-health-sectors

2)       Espiner G. AI for school tutoring, instant medical analysis part of NZ’s future - Judith Collins [Internet]. RNZ; 2024 Jul 3 [cited 2024 Jul 22]. Available from: https://www.rnz.co.nz/news/in-depth/521123/ai-for-school-tutoring-instant-medical-analysis-part-of-nz-s-future-judith-collins

3)       Office of the Prime Minister’s Chief Science Advisor. Capturing the benefits of AI in healthcare for Aotearoa New Zealand [Internet]. [cited 2025 Jun 15]. Available from: https://www.pmcsa.ac.nz/artificial-intelligence-2/ai-in-healthcare/

4)       Health New Zealand – Te Whatu Ora. Health Delivery Plan [Internet]. 2025 Jul 17 [cited 2025 Oct 1]. Available from: https://www.tewhatuora.govt.nz/publications/health-delivery-plan

5)       Valiant L. Probably Approximately Correct. Basic Books; 2013. p. 57-86.

6)       Li R, Kumar A, Chen JH. How Chatbots and Large Language Model Artificial Intelligence Systems Will Reshape Modern Medicine: Fountain of Creativity or Pandora’s Box? JAMA Intern Med. 2023 Jun 1;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835. 

7)       NZTech. AI about to speed up hospital processes – big time [Internet]. NZTech. 2019 [cited 2024 Aug 21]. Available from: https://nztech.org.nz/2019/10/31/ai-about-to-speed-up-hospital-processes-big-time/

8)       Yarkoni T, Westfall J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect Psychol Sci. 2017 Nov;12(6):1100-1122. doi: 10.1177/1745691617693393. 

9)       Mayer MM, Buchner A, Bell R. Humans, machines, and double standards? The moral evaluation of the actions of autonomous vehicles, anthropomorphized autonomous vehicles, and human drivers in road-accident dilemmas. Front Psychol. 2023 Jan 4;13:1052729. doi: 10.3389/fpsyg.2022.1052729.

10)    Simon HA. Rational choice and the structure of the environment. Psychol Rev. 1956;63(2):129-138. doi: 10.1037/h0042769.

11)    Artinger FM, Gigerenzer G, Jacobs P. Satisficing: Integrating Two Traditions. Journal of Economic Literature. 2022 Jun;60(2):598-635. doi: 10.1257/jel.20201396.

12)    Katsikopoulos KV, Şimşek Ö, Buckmann M, Gigerenzer G. Classification in the Wild: The Science and Art of Transparent Decision Making. The MIT Press; 2021. doi: 10.7551/mitpress/11790.001.0001.

13)    Lash T, VanderWeele T, Haneuse S, Rothman K. Chapter 13: Measurement and Measurement Error. In: Modern Epidemiology. 4th ed. Wolters Kluwer; 2020.

14)    Wikipedia. Just-in-time learning [Internet]. Wikipedia. 2024 [cited 2024 Jul 23]. Available from: https://en.wikipedia.org/w/index.php?title=Just-in-time_learning&oldid=1221408288