what you should know about healthcare ai (but were afraid to ask)

The access to increasing volumes of scientific and clinical data, specially with the implementation of electronic health records, has reignited an enthusiasm for bogus intelligence and its application to the health sciences. This interest has reached a crescendo in the past few years with the development of several machine learning– and deep learning–based medical technologies. The bear upon on inquiry and clinical practice inside gastroenterology and hepatology has already been significant, but the near time to come promises but further integration of artificial intelligence and machine learning into this field. The concepts underlying artificial intelligence and machine learning initially seem intimidating, but with increasing familiarity, they will go essential skills in every clinician's toolkit. In this review, we provide a guide to the fundamentals of machine learning, a concentrated area of study within bogus intelligence that has been congenital on a foundation of classical statistics. The near common machine learning methodologies, including those involving deep learning, are also described.

Keywords

Large Data
Deep Learning
Neural Networks
Statistical Learning
Supervised Learning

Abbreviations used in the paper:

AI (bogus intelligence), CNN (convoluted neural network), DL (deep learning), EHR (electronic health record), LSTM (long short-term retentivity), ML (motorcar learning), RNN (recurrent neural network), SVM (support vector machine)

Since the first of the 21st century, there has been an increased impetus to integrate the field of artificial intelligence (AI), including machine learning (ML), into the medical sciences. This involvement has gained momentum over the terminal few years with myriad discoveries in AI-based methodologies for clinical practice and decision support

; significantly, the impact of these technologies has been especially broad in gastroenterology and hepatology.

⁶

Eaton J.E.
Vesterhus One thousand.
McCauley B.1000.
et al.

Chief sclerosing cholangitis take a chance judge tool (PREsTo) predicts outcomes of the disease: a derivation and validation report using auto learning.

Crossref
PubMed
Scopus (37)
Google Scholar

While clinical applications of AI have gathered special attention, bones science disciplines take besides readily adopted ML techniques,

^ten

Greener J.1000.
Kandathil S.Grand.
Moffat L.
et al.

A guide to machine learning for biologists.

Google Scholar

using these tools to expand into new facets of genomics

and proteomics.

¹²

Swan A.L.
Mobasheri A.
Allaway D.
et al.

Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology.

Crossref
PubMed
Scopus (117)
Google Scholar

The human relationship betwixt AI and medicine, however, is not new, and information technology extends dorsum many decades. The origins of AI can exist traced to the fictional writings of Isaac Asimov

in the 1940s and the seminal piece of work of Alan Turing on computing machines during World State of war Two.

The term "artificial intelligence", however, was not used until 1956, when John McCarthy organized the briefing which established the field of AI—the Dartmouth Summertime Research Projection on Artificial Intelligence. Medical schools in the United States were quick to partner with pioneers in this nascent field, and wellness science research became the driving force for AI innovation in the 1970s with the evolution of several "proficient systems" to aid with scientific

and clinical decision-making.

Hindered past rigid, dominion-based architectures, these AI-based "expert systems" failed to be generalizable, precluding widespread adoption and leading to a difference between the fields of AI and medicine until the turn of the century.

AI, in the electric current twenty-four hour period, is a loosely divers term applied to a wide expanse of study within information science devoted to computing systems that can perform skills normally idea to crave human intelligence, such as problem-solving, visual perception, and reasoning, whereas ML is a specific set up of techniques within AI that are predicated on "learning" to model patterns in data using mathematical functions. This mathematical foundation of ML is largely built on the concepts of traditional statistics (Figure ane), and thus, ML is often referred to every bit "statistical learning".

By leveraging its origins in reckoner scientific discipline, ML diverges from classical statistics with its ability to use higher-dimensional mathematical operations on much larger information sets to decipher complex, nonlinear relationships. As a event, ML algorithms have proved to exist very useful in medicine to discriminate between groups of interest or predict specific outcomes. In fact, models such as the Bhutani nomogram,

^xx

Bhutani V.Grand.
Johnson L.
Sivieri Eastward.M.

Predictive ability of a predischarge hour- specific serum bilirubin for subsequent meaning hyperbilirubinemia in good for you term and near-term newborns.

Crossref
PubMed
Scopus (603)
Google Scholar

the Model for Cease-stage Liver Disease score,

and the Glasgow-Blatchford gamble score

could exist considered some of the earliest examples of predictive ML models in the field of gastroenterology and hepatology.

The progression of calculating processing power, the power to reliably store immense amounts of information, and the development of statistics-based ML techniques, combined with the implementation of electronic health records (EHRs), have heightened the involvement in medical AI. The potential applications of ML algorithms in research and the practice of gastroenterology and hepatology remain vast. As this technology, and its acceptance, continues to advance, such algorithms will accept an e'er-increasing role in every facet of gastroenterology. Thus, having a working agreement of the bones concepts of AI and ML has become a necessary tool in every clinician'south skillset. Here, we aim to provide a primer to the foundations of ML and introduce some of the about mutual model architectures used in medicine.

The Basics of ML

Information

ML techniques are designed to empathize and mathematically correspond the patterns present in data. As a issue, the primal to edifice accurate and applicable ML algorithms lies both in the size and quality of data used. The omics revolution and widespread deployment of the EHR have provided us with innumerable data sets with previously unfathomable amounts of experimental and clinical information that have been crucial for ML applications in the health sciences. However, the components and arrangement of these big volumes of data present a challenge in understanding and verifying their underlying quality. The wide variety of building blocks in medical data includes pixels that make up radiological and histopathological images, words as part of clinical documentation, time-based values from remote sensors, nucleotide bases from side by side-generation sequencing, and at the simplest level, rows from descriptive tables. Some of these base of operations constituents tin can easily be stored in known formats and organized into a table or sets of relational tables and, thus, are called structured information (Table 1). Although system into structured data does non direct betoken the quality or interpretability of the information independent within, the ability to use a structured compages to index and search for specific instances allows for easier verification of quality. Alternatively, the components of images, clinical documentation text, or even audio recordings have no predefined relational organization and are considered unstructured data. Occasionally, these sources of unstructured data can exist organized at a college level using metadata—information describing where, when, and how the data were created—which are referred to as semi-structured data. Although more than advanced ML techniques such equally deep learning (DL) tin, in some cases, utilise unstructured data, traditional ML methodologies tend to require employ of structured data.

Table ane Glossary of Terms

Term	Definition
Structured data	Information that has been stored in a defined, known framework, such as a database, so that it tin exist indexed, referenced, or searched easily and accurately. Structured data are usually quantitative and composed of numerical values, dates, or brusk text strings. EHRs frequently store laboratory results and flowsheet information (eg vitals) in a structured format. This term purely reflects the organization of data and does not reflect on its content.
Unstructured data	Data that are not stored in a well-divers framework that can exist referenced. Most clinical data that are accumulated are unstructured; this includes the text from clinical documentation, radiology images, endoscopy videos, and scanned reports. Because of the lack of a storage framework, unstructured data often require manual extraction of information. For instance, i cannot reference the liver in computed tomography images merely past selecting a filter for "liver pixels"; this usually requires transmission identification for each ready of images.
Features	The input variables in a data set, besides known as the independent variables, predictors, or simply the variables. Features tin can simply exist categorical values or continuous numerical ranges, or more than circuitous components such as groups of pixels in an paradigm. New features can be created by combination or transformation; this is known as feature engineering.
Ground truth	The result or output variable that is used to railroad train or to examination a model'south prediction or classification. This is ordinarily a measured variable or 1 that has been determined by domain experts and is considered the golden or reference standard.
Class/Label	The model's output variable in a classification problem. If the outputs are mutually exclusive, they are known as classes; if not, they are referred to as labels. A model to determine if a polyp was cancerous would theoretically output "malignant" or "non-cancerous" as classes. However, a model to place various structures on a liver biopsy slide could output several labels for each slide such as "portal vein", "central vein", "steatosis", and "fibrosis".
Loss/cost function	Mathematical functions that summate the difference betwixt the basis truth and the model's predicted values during the grooming of a model. This function is minimized to optimize the model'south prediction.
Parameters	The main adjustable factors bachelor to a model to optimize its performance during training. These are coordinating to coefficients in statistical regression and tin can also be considered the weights applied to each feature.
Hyperparameters	Adjustable factors, as well known equally tuning parameters, that determine how advanced models are set to acquire from the information. These are adjusted before training only and can be evaluated based on performance on the tuning set. Examples include the regularization function in ridge regressions, the k value in grand-nearest neighbors, and the learning rate and depth in neural networks.
Preparation fix	The subset of the data that is used to train a model. These are the data in which various combinations of parameters are adapted to minimize the loss function and constitute an optimal model.
Test set	Data that are used to evaluate a model's performance. These should be data that the model has not been trained or tuned on at any indicate. As such, it tin be a small-scale subset of the data that was held out or a completely external set of similar data that can exist used for evaluation. Ordinarily should exist at to the lowest degree 20% of the training set in size.
Tuning gear up	Sometimes referred to as the "validation" gear up, this is a small subset of the data available for model development. These data are used in advanced auto learning models to arrange the hyperparameters of the model to ensure that the model is non overfitting or underfitting data that was not used for preparation.
Overfitting	The situation where a model has been trained to be very specific to the information contained in the training set and is thus not generalizable. This will be reflected by excellent performance metric on the training data but poor operation on tuning or test sets. This can occur if too many features are used in a model.
Underfitting	A model is underfitting if it continues to perform poorly on the training fix despite all hyperparameter optimization. This is indicative that the model framework is a poor candidate for representing the relationships in the information.
Recall	The sensitivity or true positive rate of the model's predictions.
Precision	The positive predictive value of the model's predictions.
Accurateness	Used to evaluate the performance of a binary classifier. Defined equally the ratio of correct predictions to the full number of predictions.
F-score	Another measure of the accuracy of a binary classifier model. The traditional version of this metric is the F1-score, which represents the harmonic mean of the precision and recall.
c-statistic	A term for the value of the area under the receiver operating characteristic bend (AUROC or AUC). Used to evaluate the performance of a binary classifier.

Open table in a new tab

The limerick and organization of data sets can take a variety of forms, simply in general, they are considered a collection of unique points or observations with several variables. Each unique point is divers past the values of its variables. When modeling information, using either classical statistics or ML, certain variables are considered every bit candidate input variables, also known equally contained variables or predictors, but referred to as features in ML. If a variable representing the output of the model is present, information technology is referred to as the dependent or output variable. Features, depending on the property they are describing, can either be categorical, either as a simple binary or composed of a set of detached values, or every bit a range of continuous numerical values. In add-on, features tin be created anew past combination, mathematical transformation, or conversion from continuous to categorical, in a procedure known as feature applied science.

Model Considerations

The "learning" aspect of ML corresponds to the initial training stage of building a model, where an ML model is trained on, or "learns" from, a representative data set. This can occur in two main frameworks: supervised learning or unsupervised learning. Data that accept been labeled with an output variable using a predetermined gilt or reference standard, known as the ground truth, by subject affair experts or through actual measurement can be used to train a supervised model. Supervised learning leverages the footing truth information equally an anchoring endpoint to guide the training of a model. Conversely, unsupervised techniques are designed to be used on data sets without footing truth. Without this predefined output, unsupervised learning relies on identifying distinctive groups of patterns within the features provided. Another distinct process, semi-supervised learning, is a combination of these ii frameworks that is used when simply a relatively small corporeality of data has bachelor footing truth. This technique learns from both the subset with basis truth every bit well every bit the patterns in the remaining information to build a series of iterative models that assign all instances with a tentative output variable. These derived outputs are then used to train a final model. Defective the information contained in an established endpoint, techniques that use semi-supervised or unsupervised learning crave much larger quantities of data to attain the same level of performance as those using supervised learning. As a result, nearly high-performing ML models in the health sciences tend to apply supervised learning techniques.

Regardless of the learning framework, the applicability of all ML models is heavily dependent on the sources of the data used. This is especially impactful in the employ of ML models in health care, every bit these data are often plentiful but also usually restricted to large, academic referral centers in industrialized, western countries as sources. These populations are often imbalanced in terms of disease severity and demographics which in plow can effect in similarly skewed model predictions. Without the presence of ground truth, skewed data can unduly influence models created with unsupervised learning methodologies, and extra scrutiny is required to decide the scope of such models.

Supervised learning models, notwithstanding, are not immune to this phenomenon every bit both the source of the features and the source of ground truth (eg from one skillful or measurement vs from a consensus of multiple experts or measurements) can touch the model's results. The accumulation of more representative data is usually more labor intensive and oft involves combining several sources. This trade-off between model applicability and data drove effort is one of the several competing interests to exist considered when choosing how to build an ML model.

ML can exist used for understanding a information set with two overarching, still competing, goals in mind: pattern inference or outcome prediction. Methodologies that optimize one usually practice so at the toll of the other. Models with good design inference simplify the overarching relationships between candidate features and outcomes and, as a result, tend to have less accurateness in their predictive abilities. Conversely, the nigh accurate predictive models can rely on deducing very complex relationships between the features and outcomes rendering an understandable explanation very difficult. Another way to view the trade-offs between these competing paradigms is to compare discriminative and generative methodologies. Discriminative models concentrate on calculating the most efficient purlieus betwixt different outcomes, virtually completely ignoring the overall distribution of the outcomes. Generative models, on the other hand, build a full representation of the distribution of each outcome without specifically focusing on separating them.

These merchandise-offs between predictability and interpretability are important to recognize in some of the about mutual ML algorithms applied to medicine. Supervised algorithms that are used to place data points into discrete groupings are known equally classifiers and are more often than not referred to as solving a "classification problem". If the groupings output by a classifier are mutually exclusive, they are known every bit classes only otherwise are called labels. Classifiers are prototypical discriminative models and tin can frequently lack information on the features driving the bigotry. The solutions to "regression problems" are more than interpretable models, using classical and nonlinear regression methodologies, that can output continuous numerical values based on a set of features balancing accurateness and predictive abilities. Unsupervised clustering algorithms that are used to group unlabeled data points also tend to residuum discriminative and generative features.

Training a Model

The foundation of building an ML model is based on optimizing the mathematical operations practical to the input features to attain the closest possible outcomes or predictions to the ground truth. In terms of the classic linear regression models, this can be simplified to choosing the coefficients for each independent variable such that the deviation between the model prediction and basis truth is minimized. This difference, with regard to regression analysis, is measured using the mean squared error function which calculates the average squared difference between the actual and predicted outcome values. With more than advanced ML models, higher-level functions such every bit cross-entropy loss estimate this deviation improve than mean squared error. In general, these functions that capture the difference between model predictions and ground truth are known as loss functions or toll functions.

During grooming, an ML model learns by varying its parameters, adjustable values specific to each feature and analogous to coefficients in regression models, to minimize its loss role. This process occurs on a subset of the total data set that has been prespecified to exist the training fix. Once the optimal parameters accept been chosen, the performance of the model can be evaluated on a test ready. A examination set is data structured exactly like the training set but that the model has never seen, either as a concord-out subset of the initial data ready or from a completely external source. Traditionally, the test set should be at to the lowest degree xx% the size of the training set. Poor performance on the test set can be indicative of either underfitting or overfitting. Underfitting commonly shows poor performance in both the training and exam sets and indicates that the model framework, even when optimized, is a poor candidate to represent the relationships present in the data. Overfitting volition show excellent performance in the training set but poor performance in the exam set; this is ordinarily an indicator that the model has been overly optimized for the specific information points in the training ready and thus has lost generalizability to data it has not seen.

While traditional regression models are limited in how they are trained, more advanced regression and ML models accept the choice of adapting how they larn based on their tuning parameters. These tuning parameters, or hyperparameters, are also adjustable values but do non class a part of the model and are established before preparation. However, hyperparameters have a large impact on the final parameters and model performance and thus form an important office of optimizing the model. In advanced regression models, required hyperparameters include the regularization function which allows for feature selection, whereas other ML and DL models can have several hyperparameters including the learning rate besides as the size and complexity of the network architecture. As hyperparameters are not adjusted during training, their effects demand to be specifically monitored. This is commonly performed using a tuning set before the model is finalized and evaluated on a examination set.

Evaluating a Model

As described previously, ML models are evaluated based on their performance on a test gear up. This evaluation is very like to traditional biostatistical metrics, admitting with slightly dissimilar nomenclature. The most common metrics include recall (sensitivity) and precision (positive predictive value). In addition, the ratio of correct predictions to total predictions (known equally accuracy) and the F-score are used to quantify an ML model's accurateness. Visually, the model performance is ofttimes shown as a receiver operating characteristic bend with additional reporting of the area under the curve, as well known as the c-statistic, every bit a quantitative mensurate of functioning. Increasingly, the precision-retrieve plot and the surface area nether the curve are also being reported, particularly in the setting of imbalanced data sets.

Types of Models

Classical Supervised ML Algorithms

Regression models, equally alluded to previously, grade the crucial link between ML and statistics. Linear regression (Table 2) is the simplest and about well-known of these models with the lowest computational toll. Although they offer great interpretability and produce quantitative rather than class predictions, they are not able to account for the several nonlinear relationships found in medical information. However, several nonlinear regression techniques, including polynomial and stepwise regressions likewise as regression splines, take been adult to enhance their flexibility. In addition to nonlinear regressions, techniques such as ridge,

elastic internet,

and to the lowest degree accented shrinkage and selection operator

are regression frameworks that permit for hyperparameter tuning to assist with feature choice.

Table two Classical Supervised Motorcar Learning Techniques

Technique	Overview
Linear regression	Classical statistical model that calculates a "line of best fit" between inputs and output. Tin be made more than flexible to estimate limited nonlinear relationships by using stepwise regressions and splines.
Advanced regression (ridge, LASSO, rubberband internet)	Expansion of linear regression models with a regularization hyperparameter. Allows for feature selection but like linear regression cannot estimate complex relationships.
Support vector machine	Discriminative, classification technique for both linear and nonlinear relationships. Uses a kernel function to mathematically transform each data point in a higher-dimensional feature space and so a hyperplane (high-dimensional geometrical plane) separates groups.
Decision trees	Simple, still versatile, model that uses several levels of branched conclusion points (nodes) based on feature values that finish in groupings of final nodes chosen leaves. The simplicity allows for straightforward interpretation and characteristic selection, but it is difficult to model circuitous relationships.
Random forest	Ensemble model created by using many conclusion trees that deed as a committee. This allows for ameliorate nomenclature of complex relationships; withal, it is more difficult to translate.
Slope boosting	Another ensemble model that uses decision trees in a variety of phase-wise arrangements to improve classification operation, particularly in circuitous relationships. Models are adequately interpretable but computationally expensive.
k-nearest neighbors	Nonparametric nomenclature model that uses Euclidean distance from footing truth labels to classify new data. Can often exist used for tentative labeling every bit office of a semi-supervised learning model.
Naïve Bayes	Classification model that uses Bayes' dominion to assign the probability of belonging to each class based on the ground truth information. Relies on the independence of features, so it can have poor functioning if several features are non independent of each other.

LASSO, to the lowest degree absolute shrinkage and selection operator.

Open up table in a new tab

Support vector machines (SVMs) are another traditional classification model that build on the regression methods but are not restricted to linear relationships.

SVMs attempt to discriminate between classes of information points by using a series of mathematical operations, known every bit the kernel role, to transform each data point into a loftier-dimensional feature space such that a hyperplane can be used to separate classes.

Depending on the structure of the kernel function, SVMs can model both linear and nonlinear relationships.

There are several tree-based methods in ML that tin be used for regression and classification bug. The simplest of these are the regression and classification decision copse, which create branches of decisions based on quantitative and qualitative rules, respectively. The branched decision points, or internal nodes, are optimized to produce the about accurate groupings at the terminate of the decision tree as terminal nodes, or leaves.

Although determination copse allow for interpretability, they practise not often perform well at classifying complex medical data. As a result, conclusion trees are often combined into ensemble models such equally random forests and gradient boosting models, such as XGBoost. While random forests combine several decision copse to human action as a committee, gradient boosting applies weak determination trees in a stage-wise fashion to achieve better accurateness. Ensemble models can be peculiarly powerful classifiers with boosted characteristic choice properties but practice require increased computing power.

Other classical models such as the grand-nearest neighbors and naïve Bayes classifiers are simple models that exercise not use regression-based frameworks. The k-nearest neighbors model is a nonparametric classifier which bases its prediction on Euclidean altitude betwixt feature vectors. Although oft used as a baseline when comparing the performance of several ML classifiers, m-nearest neighbors can too be practical for semi-supervised tentative labeling of the test set. The naïve Bayes method applies Bayes' rule to each data betoken to calculate the probability of belonging to each class, bold that the features are independent of each other. A simple, yet robust classifier, the naïve Bayes classifier is quite pop only can underperform in situations where several features are non independent.

Classical Unsupervised Techniques

The employ of unsupervised learning is rare in clinical medicine considering of the need for large well-organized, structured data sets. However, unsupervised clustering and dimensionality reduction methods are widely used with sequencing in basic and translational medical enquiry particularly unmarried-cell techniques, such every bit single-cell RNA-seq and single-cell ATAC-seq.

These sequencing techniques produce hundreds of thousands of information points which are amenable to several clustering methods. The k- ways clustering methodology (Figure 2) is the most mutual framework for unsupervised classification. It is like to the thousand-nearest neighbors supervised model in that it uses the Euclidean distance to classify data points, simply instead of prespecified classes, new clusters are created for the information. The variable thousand functions as a hyperparameter indicating how many clusters should exist created.

Figure thumbnail gr2 — Figure 2 Unsupervised models and neural networks. (A) *1000*-means clustering: unsupervised classification based on Euclidean distances between groups. (B) Principal discriminant assay (PCA): unsupervised dimensionality reduction to modest number of features; small arrows point reduction to ii linear feature vectors which preserve a large majority of the variation in the original data gear up. (C) Multilayer perceptron: a simple neural network with an input layer, output layer, and a hidden layer in between with all layers fully continued. (D) Convolutional neural network: a neural network with spatial connections between layers with several hidden layers between input and output layers.

View Large Image

Figure Viewer

Download Hi-res image

Download (PPT)

Equally molecular dynamics and sequencing data can have innumerable features, the adoption of unsupervised dimensionality reduction, or transforming each data signal into a lower dimension (fewer features) while preserving central relational attributes, has become very important. Dimensionality reduction is often necessary to avoid overfitting in ML models simply tin also be used to visually represent the key features in large, unlabelled data sets such equally sequencing data. Master components analysis (Effigy 2) is the well-nigh common procedure used for dimensionality reduction; it is an unsupervised approach that reduces the data prepare to a much smaller number of features, known every bit principal components, while nevertheless maintaining most of the variation and patterns from the original data set. Uniform Manifold Approximation and Projection and t-distributed stochastic neighbor embedding

³¹

van der Maaten L.
Hinton G.

Visualizing data using t-SNE.

Google Scholar

are other unsupervised dimensionality reduction techniques which tend to preserve local construction between information points meliorate than principal components analysis and thus have become very common in visual representations of single-cell transcriptomics data.

Deep Learning Architectures

Artificial neurons take been studied since the 1940s

merely over the final few decades have garnered further attention with the development of more advanced artificial neural network architectures and the aforementioned enhancements in computing power.

The compages of neural networks is loosely based on the human brain and consists of an initial layer which receives input, followed past a variable number of hidden layers before reaching the output. Each layer consists of several artificial neurons, or nodes, each converting its input into an output using a specific mathematical role (eg logistic regression). The output from each bogus neuron is presented to the next layer's neurons equally a new input. The unique property of neural networks, in the right configuration, to guess any mathematical role has led to the development of DL as a subset of ML methodology, although with far more flexibility and suitability for nonlinear relationships. DL methodologies can utilise structured and unstructured information and comport both supervised and unsupervised learning in tandem. There are several architectures of deep neural networks that have gained use in medical DL, each with its ain uses.

A multilayer perceptron (Figure 2) is the simplest of the deep neural networks consisting of an input layer, an output layer, and at to the lowest degree one hidden layer. Multilayer perceptrons are arranged in a fully connected fashion where each node from one layer is connected to each node in the next layer. Each of these connections indicates a weight parameter that can exist trained and optimized. Although a powerful network compages on its ain, it is often simply seen as a "fully connected layer" as a part of new, more advanced architectures.

A convolutional neural network (CNN) is a DL compages (Figure 2) designed to leverage the spatial organization of the input information. The input layer's nodes are specifically configured to simply connect with specific adjacent nodes in the adjacent layer to ascertain spatial structure. Although CNNs tin can be applied to any information type where the natural ordering holds some importance (eg genetic sequences), it is peculiarly suited to image analysis and image nomenclature. In the medical context, the CNN compages is often seen in the nomenclature of radiology and pathology images.

Recurrent neural networks (RNNs) were created to be practical to time serial or ordered sequential data. RNNs bike between nodes in a layer producing outputs for each sequential input based on prior inputs. This property gives RNNs a form of retention that allows recent inputs to influence the current output. As this memory is short-lived, advanced RNNs called long short-term retentiveness (LSTM) have also been developed to sustain a much longer memory. RNNs and LSTM have myriad uses in bones science particularly with regard to protein construction decision and genetic sequence assay. In clinical medicine, fourth dimension series information from sensors or sequential data from the EHR accept too been used with RNNs. Notably, the field of natural language processing besides tends to use RNN and LSTM architectures to understand the context of sequential words.

Several other less common neural network architectures exist such as graph convolutional networks, suited for data with arbitrary, unordered connections without an image-similar construction, or autoencoders, an architecture designed to recode a data set in lower dimensionality. Importantly, DL remains a field with continuous innovation with rapid development of new neural network architectures to run across unmet needs.

Conclusion

The employ of AI and ML is not new in medicine, but the transition to the EHR and investment into calculating ability and data curation has led to a renaissance in these fields. Combined with the development of DL methodologies, the impact of AI and ML has been immense on basic science and translational discovery, and the effects are starting to exist seen in clinical exercise besides. However, with the availability of well-organized and labeled data however at a premium and continual innovation in computer science, the true impact of AI in medicine has likely not been felt yet. It is plausible that ML-based tools and AI-derived decision support systems will become increasingly common in the next decade in the gastroenterology and hepatology practices. Thus, it is important that these techniques and their underlying scientific discipline do not remain shrouded in mystery.

Authors' Contributions:

Puru Rattan contributed to manuscript conception, writing, and revision; Daniel D Penrice and Douglas A Simonetto contributed to manuscript writing and revision.

References

- Gulshan Five.
- Peng L.
- Coram M.
- et al.
Evolution and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
JAMA. 2016; 316 : 2402
- Attia Z.I.
- Kapa S.
- Lopez-Jimenez F.
- et al.
Screening for cardiac contractile dysfunction using an artificial intelligence–enabled electrocardiogram.
Nat Med. 2019; 25 : 70-74
- Singh R.
- Kalra M.M.
- Nitiwarangkul C.
- et al.
Deep learning in chest radiography: detection of findings and presence of change.
PLoS I. 2018; 13 e0204155
- Mori Y.
- Kudo S.
- Misawa M.
- et al.
Real-fourth dimension use of artificial intelligence in identification of diminutive polyps during colonoscopy.
Ann Intern Med. 2018; 169 : 357
- de Groof A.J.
- Struyvenberg Thou.R.
- van der Putten J.
- et al.
Deep-learning system detects neoplasia in patients with Barrett'south esophagus with higher accuracy than endoscopists in a multistep training and validation report with benchmarking.
Gastroenterology. 2020; 158 : 915-929.e4
- Eaton J.E.
- Vesterhus M.
- McCauley B.K.
- et al.
Main sclerosing cholangitis take a chance estimate tool (PREsTo) predicts outcomes of the disease: a derivation and validation study using machine learning.
Hepatology. 2020; 71 : 214-224
- Ahn J.C.
- Connell A.
- Simonetto D.A.
- et al.
Awarding of artificial intelligence for the diagnosis and treatment of liver diseases.
Hepatology. 2021; 73 : 2546-2563
- Libbrecht M.W.
- Noble Westward.South.
Machine learning applications in genetics and genomics.
Nat Rev Genet. 2015; 16 : 321-332
- Ching T.
- Himmelstein D.Southward.
- Beaulieu-Jones B.K.
- et al.
Opportunities and obstacles for deep learning in biological science and medicine.
J R Soc Interface. 2018; xv : 20170387
- Greener J.G.
- Kandathil S.Grand.
- Moffat L.
- et al.
A guide to machine learning for biologists.
Nat Rev Mol Cell Biol. 2021; 0123456789
- Yip M.Y.
- Cheng C.
- Gerstein Chiliad.
Machine learning and genome note: a match meant to be?.
Genome Biol. 2013; 14 : 205
- Swan A.L.
- Mobasheri A.
- Allaway D.
- et al.
Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biological science.
OMICS. 2013; 17 : 595-610
- Toosi A.
- Bottino A.K.
- Saboury B.
- et al.
A brief history of AI: how to prevent another wintertime (a critical review).
PET Clin. 2021; 16 : 449-469
- Turing A.Chiliad.
On computable numbers, with an application to the entscheidungsproblem.
Proc Lond Math Soc. 1937; s2-42 : 230-265
- Lindsay R.K.
- Buchanan B.G.
- Feigenbaum E.A.
- et al.
DENDRAL: a example report of the first skilful organisation for scientific hypothesis formation.
Artif Intell. 1993; 61 : 209-261
- Schwartz Due west.B.
Medicine and the computer.
N Engl J Med. 1970; 283 : 1257-1264
- Yu 5.L.
Antimicrobial selection by a figurer.
JAMA. 1979; 242 : 1279
- Miller R.A.
- Pople H.E.
- Myers J.D.
Internist-I , an experimental reckoner-based diagnostic consultant for general internal medicine.
N Engl J Med. 1982; 307 : 468-476
- James G.
- Witten D.
- Hastie T.
- et al.
An introduction to statistical learning.
Springer The states, New York 2021
- Bhutani V.K.
- Johnson L.
- Sivieri E.M.
Predictive ability of a predischarge hr- specific serum bilirubin for subsequent significant hyperbilirubinemia in salubrious term and near-term newborns.
Pediatrics. 1999; 103 : half-dozen-14
- Malinchoc G.
- Kamath P.S.
- Gordon F.D.
- et al.
A model to predict poor survival in patients undergoing transjugular intrahepatic portosystemic shunts.
Hepatology. 2000; 31 : 864-871
- Kamath P.South.
- Wiesner R.H.
- Malinchoc M.
- et al.
A model to predict survival in patients with end-stage liver disease.
Hepatology. 2001; 33 : 464-470
- Blatchford O.
- Murray W.R.
- Blatchford Thousand.
A adventure score to predict need for treatment for uppergastrointestinal bleeding.
Lancet. 2000; 356 : 1318-1321
- Saito T.
- Rehmsmeier 1000.
The precision-recall plot is more than informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.
PLoS I. 2015; 10 e0118432
- Hoerl A.E.
- Kennard R.W.
Ridge regression: biased interpretation for nonorthogonal problems.
Technometrics. 2000; 42 : lxxx
- Zou H.
- Hastie T.
Regularization and variable option via the elastic net.
J R Stat Soc Series B Stat Methodol. 2005; 67 : 301-320
- Tibshirani R.
Regression shrinkage and selection via the Lasso.
J R Stat Soc Series B Stat Methodol. 1996; 58 : 267-288
- Boser B.East.
- Guyon I.M.
- Vapnik Five.North.
A training algorithm for optimal margin classifiers.
in: Proceedings of the fifth annual workshop on computational learning theory - COLT '92. ACM Press, New York 1992: 144-152
- Noble West.S.
What is a support vector machine?.
Nat Biotechnol. 2006; 24 : 1565-1567
- Moon G.R.
- van Dijk D.
- Wang Z.
- et al.
Visualizing structure and transitions in high-dimensional biological information.
Nat Biotechnol. 2019; 37 : 1482-1492
- van der Maaten L.
- Hinton M.
Visualizing data using t-SNE.
J Mach Learn Res. 2008; nine : 2579-2605
- McCulloch W.Southward.
- Pitts W.
A logical calculus of the ideas immanent in nervous action.
Balderdash Math Biophys. 1943; 5 : 115-133
- Lecun Y.
- Bengio Y.
- Hinton G.
Deep learning.
Nature. 2015; 521 : 436-444

Article Info

Publication History

Published online: Nov 09, 2021

Accustomed: November 2, 2021

Received: October 27, 2021

Footnotes

Conflicts of Interest: The authors disclose no conflicts.

Funding: Dr Simonetto's research is funded by National Institutes of Health 1U01DK130181 and National Institute on Alcohol Abuse and Alcoholism U01AA026886-03 .

Ethical Statement: The report did not require the approval of an institutional review board.

Identification

DOI: https://doi.org/10.1016/j.gastha.2021.11.001

Copyright

User License

Artistic Eatables Attribution – NonCommercial – NoDerivs (CC By-NC-ND 4.0) |

How yous can reuse Information Icon

Permitted

For non-commercial purposes:

Read, print & download
Redistribute or republish the final article
Text & data mine
Translate the article (individual use simply, not for distribution)
Reuse portions or extracts from the commodity in other works

Not Permitted

Sell or re-employ for commercial purposes
Distribute translations or adaptations of the article

Elsevier's open access license policy

ScienceDirect

Admission this article on ScienceDirect

View Large Prototype
Download How-do-you-do-res paradigm
Download .PPT

Related Manufactures

harperency1954.blogspot.com

Source: https://www.ghadvances.org/article/S2772-5723(21)00025-X/fulltext