Application of various machine learning techniques to the analysis of clinical, anamnestic, and embryological data of patients undergoing assisted reproductive technologies
Drapkina Yu.S., Makarova N.P., Vasilev R.A., Amelin V.V., Frankevich V.E., Kalinina E.A.
Data analysis using machine learning (ML) enables more accurate and targeted identification of the most important modifiable and non-modifiable predictors of pregnancy in assisted reproductive technology (ART) programs for patients across different age groups. Predicting the performance of an ART program using ML can be achieved through various algorithms, depending on the data type and specific task at hand.
Objective: This study aimed to analyze the processing of clinical, anamnestic, and embryological data from patients undergoing ART using different ML methods. It also seeks to determine the accuracy of ART outcome prediction using various algorithms, and to select the ML model that holds the greatest practical value for predicting the onset of pregnancy.
Materials and methods: This retrospective study included 854 married couples. It analyzed data from clinical and laboratory examinations, as well as parameters of the stimulated cycle, depending on the effectiveness of the ART program using the gradient boosting algorithm over decision trees (CatBoost).
Results: Key factors that significantly influence the effectiveness of ART include the presence or absence of a history of pregnancy, the concentration of sperm in the ejaculate, and the number of embryos with arrested development. A software product based on the gradient boosting algorithm was developed to predict the individual effectiveness of the ART programs.
Conclusion: Enhancing the prediction of the effectiveness of ART programs requires better mathematical models with an integrated approach to the problem and additional markers to improve the accuracy of the software product. Constructing a model that includes not only the couple’s history but also molecular markers using ML methods will allow for the most accurate determination of the most promising groups of patients for in vitro fertilization programs, and it will increase the efficiency of ART programs by selecting the highest-quality embryos for transfer.
Authors’ contributions: Drapkina Yu.S., Makarova N.P., Frankevich V.E., Kalinina E.A. – conception and design of the study; Drapkina Yu.S., Amelin V.V., Vasiliev R.A. – data collection and processing; Amelin V.V., Vasiliev R.A. – statistical analysis; Drapkina Yu.S., Amelin V.V., Vasiliev R.A. – drafting of the manuscript; Kalinina E.A., Frankevich V.E., Makarova N.P. – editing of the manuscript.
Conflicts of interest: The authors have no conflicts of interest to declare.
Funding: There was no funding for this study.
Ethical Approval: The study was reviewed and approved by the Research Ethics Committee of the V.I. Kulakov NMRC for OG&P.
Patient Consent for Publication: All patients provided informed consent for the publication of their data.
Authors’ Data Sharing Statement: The data supporting the findings of this study are available upon request from the corresponding author after approval from the principal investigator.
For citation: Drapkina Yu.S., Makarova N.P., Vasilev R.A., Amelin V.V., Frankevich V.E., Kalinina E.A. Application of various machine learning techniques to the analysis of clinical, anamnestic, and embryological data of patients undergoing assisted reproductive technologies.
Akusherstvo i Ginekologiya/Obstetrics and Gynecology. 2024; (3): 96-107 (in Russian)
https://dx.doi.org/10.18565/aig.2023.281
Keywords
The development and implementation of technologies based on artificial intelligence (AI) represents a priority trend in modern healthcare, with machine learning (ML) being a key area of application [1]. Programs utilizing ML have the potential to significantly enhance diagnostic systems, facilitate drug development, and improve medical care quality, while reducing costs. The primary goal of ML is to create software capable of analyzing complex problems that lack guaranteed solution algorithms [2]. The processes in ML parallel those of data mining and predictive modeling, which are crucial for analyzing patterns and adjusting program actions.
ML is extensively used in various medical fields, including assisted reproductive technology (ART) [3]. In reproductive medicine, ML plays a crucial role in establishing relationships between specific characteristics, based on extensive sets of observation cases. For instance, it can predict pregnancy rates in ART programs based on clinical and anamnestic data, or forecast blastulation rates based on sperm quality. ML-based programs form the foundation of expert systems and simulate the decision-making processes of qualified experts. Currently, reproductive medicine emphasizes the development of expert systems for predicting infertility treatment outcomes and selecting therapy strategies, considering comprehensive information about couples [4].
It is worth noting that erroneous predictions of ART outcomes hinder timely treatment selection, compromise patient expectations, and impede the optimal allocation of funds from the Federal Fund for Mandatory Medical Insurance [5]. Therefore, predicting the effectiveness of ART programs has become a top priority in software development.
Various ML algorithms can predict the performance of ART programs, depending on the data type and task at hand. Key ML methods in reproductive medicine include logistic regression, decision tree algorithms, the random forest method, and gradient boosting over decision trees (e.g., XGBoost and CatBoost) [6]. Logistic regression addresses classification problems by indicating the probability of a given value belonging to a specific class. The decision-tree algorithm uses a hierarchical structure in the form of a tree model to make decisions. The tree is constructed by dividing the data into subsets based on variables for classification until only one class remains [7]. It is worth noting that one decision tree tends to overfit for a specific training sample; therefore, in practice, a composition of decision trees (Random Forest) should be used. The Random Forest algorithm is based on the use of several decision trees. Optimizing decision trees for a specific problem involves enumerating the variables and partition thresholds to find the best partition [8].
Random Forest is widely used for building models and can be built in parallel by building deep trees on large data with a high number of variables. However, this is not very effective because the learning process is labor-intensive and time-consuming. It is possible to increase the speed of tree construction by limiting the depth. However, in this case, the accuracy of the model decreased. Additionally, complex problems may require more trees. If we abandon the position where each tree is built independently of all the others and try to take into account the “experience” of the results obtained when constructing previous trees, then we can more effectively combine trees into a composition, which is the essence of the gradient boosting method. Gradient boosting allows the building of each subsequent tree such that it minimizes the error of all previous trees. This principle is called “composition by induction.” The output of individual trees is assigned a weight. Incorrect classifications from the first decision tree are then assigned more weight, after which the data are passed on to the next tree. After numerous rounds, boosting combines weak classifiers into a powerful prediction algorithm [9].
Gradient boosting can be used not only for decision trees but also for other algorithms. In other words, gradient boosting is a gradient descent in the space of the algorithms. Unlike the Random Forest algorithm, gradient boosting is easily retrained on data. The effectiveness of boosting lies in using simple algorithms (do not train each algorithm for a long time, trees with a depth of approximately 5–8) and choosing each next one such that the error is minimized. This technique leads to significantly improved results, and on large volumes of data, gradient boosting works faster than the Random Forest method [10].
Currently, a large number of studies have been published on the development of a predictive model of the outcome of the ART program based on ML; however, most studies have analyzed the predictive ability of models built on the basis of Random Forest. One of the largest studies in the field of ML based on the Random Forest algorithm was published in 2022. This study included 24,730 couples undergoing infertility treatment using in vitro fertilization (IVF)/intracytoplasmic sperm injection (ICSI). The algorithm was trained using a Random Forest model and a logistic regression. The study identified the variables that most influenced the prognosis of treatment, among which the most significant contribution was made by the ovarian stimulation protocol, and Random Forest proved to be the most promising ML method [11]. Despite the positive results shown by algorithms based on random forests, these models have drawbacks that limit their application and operation.
This study aimed to analyze the processing of clinical, anamnestic, and embryological data from patients undergoing ART using different ML methods. It also seeks to determine the accuracy of ART outcome prediction using various algorithms, and to select the ML model that holds the greatest practical value for predicting the onset of pregnancy.
Materials and methods
In the previous stage of the study, together with bioinformaticians and specialists in ML and AI, a pilot study was conducted and data from clinical and laboratory examinations and parameters of the stimulated cycle were analyzed according to the effectiveness of ART using the three most widely used ML algorithms: logistic regression, decision tree, and Random Forest [9]. According to the results obtained when comparing the three ML algorithms, the most accurate prediction of pregnancy rates in the ART program was obtained using the Random Forest algorithm. The model identified the most significant factors that are important in determining the effectiveness of the ART program: embryonic arrest, final oocyte maturation trigger, number of embryos of excellent and average quality, duration of stimulation, infertility factor, body mass index, and level of follicle-stimulating and anti-Müllerian hormone (AMH). In addition, the significance of the predictors, which were determined using the decision tree algorithm, was also confirmed using a Random Forest: the presence/absence of a history of pregnancies, parameters of the stimulated cycle, spermogram indicators on the day of puncture, number of embryos of excellent and good quality, and quality of embryos.
This study retrospectively included 854 couples aged 21–44 years who sought ART treatment for infertility. Written informed consent for processing personal data was obtained from each couple.
The criteria for inclusion in the study were infertility caused by tubo-peritoneal, male, or combined factors; chronic anovulation or diminished ovarian reserve; the presence of a normal karyotype of the spouses; ovarian stimulation according to the protocol with a gonadotropin-releasing hormone antagonist (antGnRH); standard protocol for post-transfer period support; obtaining oocytes on the day of transvaginal puncture (TVP); and transfer of one embryo. The exclusion criteria were abnormalities in the structure of the uterus, karyotype abnormalities, and use of donor oocytes or sperm.
The patients included in the study were divided into five groups according to age: 21–24 years (group 1, n=100), 25–29 years (group 2, n=195), 30–34 years (group 3, n=220), 35–39 years (group 4, n=256), and 40–44 years (group 5, n=83).
Patients enrolled in the study underwent ovarian stimulation according to the anti-GnRH protocol, starting on day 2 or 3 of the menstrual cycle. When a follicle diameter ≥17 mm was reached, patients were administered a human chorionic gonadotropin (HCG) trigger for final oocyte maturation (600 patients) or, in case of risk of ovarian hyperstimulation syndrome, the trigger was replaced with a GnRH agonist (171 patients) or a double trigger for final oocyte maturation was prescribed (83 patients). TVP was performed 35–36 hours after ovulation induction, followed by oocyte retrieval and quality assessment. Oocyte fertilization was performed using IVF (5.6%), ICSI (81.9%), and physiological ICSI (PIXI) (12.5%). All stages of culture were carried out in multi-gas incubators SOOK (Ireland) in 25 μl drops under oil (Irvine Sc., USA) at the B.V. Leonov Department of Assisted Technologies for the Treatment of Infertility. On the 5th day after fertilization, the embryo was transferred into the uterine cavity using a soft Wallace (Germany) or Cook (Australia) catheter. The remaining embryos, of suitable quality for further use in the cryopreserved transfer, were cryopreserved. Luteal phase support and post-transfer management were performed according to generally accepted standard methods. On day 14 after embryo transfer, the β-subunit of human chorionic gonadotropin (β-HCG) was assessed. If the β-HCG result was positive, patients underwent pelvic ultrasonography 21 days after transfer to diagnose a clinical pregnancy. Further pregnancy management was individualized.
The embryological parameters of the stimulated cycle were also analyzed: sperm indicators on the day of TVP (sperm concentration, percentage of progressively motile spermatozoa, percentage of non-progressive, motile spermatozoa, percentage of morphologically healthy spermatozoa), number of oocyte-cumulus complexes (OCC), number of fertilized oocytes (2PN), mature oocytes (MII)), number of fertilized oocytes (2PN), embryo quality, number of blastocysts of excellent, good, and average quality, and number of embryos that had stopped developing. In addition, the incidence of clinical pregnancy was analyzed.
Statistical analysis and algorithm for building ML models
Together with experts in ML and AI, the study analyzed 51 input variables divided into three groups:
- Binary – discrete variables that take only two values;
- Categorical – discrete variables that take one of a finite number of values and indicate that an object belongs to a certain category;
- Real – values of variable from real set.
Before the modeling stage, the analyzed sample was randomly divided into training (train) and testing (test) in the ratio of train – 70% and test – 30%.
The following algorithms were considered for modeling:
- Logistic regression;
- Decision tree;
- Random Forest;
- Gradient boosting over decision trees (XGBoost and CatBoost).
To analyze the model quality metrics, the precision and recall criteria were introduced.
Let us assume that there are two classes and that an algorithm predicts that each object belongs to one of the classes; then, the classification error matrix will look as shown in Table 1.
Thus, there are two types of classification errors: false negatives (FN) and false positives (FP). To evaluate the quality of the model's performance on each class separately, we used precision and recall metrics.
Precision can be interpreted as the proportion of objects called positive by the classifier that are actually positive, and recall shows the proportion of objects in the positive class out of all objects in the positive class that were found by the algorithm. It is precisely the introduction of precision that does not allow the placement of all objects in one class, because in this case, we obtain an increase in the false positive rate. Recall shows the ability of the algorithm to detect a given class in general, and precision shows its ability to distinguish this class from other classes. Precision and recall do not depend on the class ratio and are therefore applicable in conditions of unbalanced samples.
There are several ways to combine precision and recall into an aggregate quality measure. F1-score (in general Fβ) – harmonic mean precision and recall:
In this case, β determines the weight of precision in the metric, and with β=1, it is the harmonic mean (with a factor of 2, such that in the case of precision = 1 and recall = 1, F1=1).
The F-measure reaches its maximum when the recall and precision are equal to one and is close to zero when one of the arguments is close to zero. Another way to evaluate the model as a whole, without being tied to a specific threshold, is AUC-ROC, the area under the error curve bounded by the ROC curve, and the proportion of false positive axis. The higher the AUC value, the better the classifier; in this case, a value of 0.5 indicates the unsuitability of the selected classification method.
Results
The quality metrics that were used to compare the effectiveness of each model are listed in Table 2.
According to Table 2 and the results of the quality metric analysis, the CatBoost model shows the best results. Notably, the CatBoost model has another advantage over other ML models. The CatBoost algorithm works with categorical variables, which makes it possible to interpret the contribution of each indicator to the final model prediction (in other models, categorical variables are encoded using the One-Hot Encoding method; thus, instead of one indicator, N new ones appear, where N is the number of categories, which makes the interpretation process slightly more difficult) [12].
The next step was to create a precision-recall curve to analyze the results of the trained model in more detail. This curve shows a trade-off between precision and recall. The classifier score value for each test case indicates the confidence of the classifier in predicting a positive or negative class. For example, as shown in Figure 1, at 100% completeness, the accuracy of the model was 58%.
The SHAP (SHapley Additive exPlanations) library, shown in Figure 2, was used to interpret the importance of each indicator in the final model prediction. To evaluate the importance of the indicators, Shapley values were calculated, which made it possible to identify all possible combinations and options, and after analyzing the data, to determine which factors are actually important in the choice. To evaluate the importance of the indicator, the predictions of the model "with" and "without" this indicator were evaluated.
To correctly analyze the graph, it is necessary to note that each point is a separate observation, and the higher the sign on the y-axis, the more important it is in relation to the pregnancy rate (Fig. 2). The values to the left of the central vertical line are the negative class (0), and those to the right are the positive class (1); moreover, the thicker the line on the graph, the greater the number of observation points. The values of the corresponding attributes are indicated by color: the higher the attribute, the redder it is indicated, and the lower values are indicated in blue.
In addition, SHAP allows for the interpretation of specific observations, that is, obtaining local explanations for a particular pair based on the importance of features. Consider the example of a patient of advanced reproductive age seeking treatment with ART, with primary infertility and an anti-Müllerian hormone (AMH) level of 0.1 ng/ml. According to the constructed model, the effectiveness of the program in this patient was low, even without additional analysis of sperm parameters and the woman's age, since only the AMH level and the presence/absence of a history of pregnancy contributed to the final prognosis in this case (Fig. 3).
The resulting graph shows the effects of the different variables on the final prediction of the model. In the case of classification, some variables shifted to class 0 and some to class 1. Therefore, if the Shapley value is positive (highlighted in pink), then it shifts the prediction towards the positive class (1, to the right), if it is negative (highlighted in blue) – negative (0, to the left).
In the next stage, it was decided to exclude from the training sample the stimulated cycle parameters and the embryological stage to calculate a predictive indicator of the effectiveness of ART (excluded: IVF attempt, stimulation drug, stimulation duration, final oocyte maturation trigger, total dose, type of fertilization, sperm parameters, number of OCC, MII, 2PN, embryo quality, number of blastocysts of excellent, good, average quality, and number of arrested blastocysts).
The CatBoost algorithm was used as the model and the results are presented in Table 3.
Compared to the CatBoost model trained on all data, there was only a slight decrease in quality metrics: accuracy decreased by 0.03, and recall by 0.04. A graphical interpretation of the importance of each variable is shown in Figure 4.
It can be concluded that the parameters of the stimulated cycle and embryological stage certainly influence the onset of clinical pregnancy, but their influence is not significant. However, the results obtained can be explained in terms of a strong relationship between clinical and anamnestic parameters and corresponding indicators of the stimulated cycle and embryological stage, which requires further study.
Thus, the resulting system showed that the contribution of certain indicators to the final prognosis may differ for different patients. Similarly, the most important factors determining final prognosis may differ between couples. To verify the accuracy of the obtained model estimates, it is first necessary to expand the training sample to cases that are less common in this sample. However, we can conclude that the model produces fairly high metrics when solving the problem of determining the probability of a clinical pregnancy.
Discussion
Currently, ML methods are actively being implemented in medical information systems. First, there is a need to analyze a large amount of information about patients, as well as predict the outcome of treatment [13]. In this pilot study, three ML algorithms were analyzed: logistic regression, decision tree, and Random Forest. According to the obtained metrics, the most accurate model regarding pregnancy rate in the ART program was the algorithm built using Random Forest [9]. Furthermore, the obtained results were analyzed using gradient boosting over the decision trees. The results of this study showed that the most accurate models for solving classification problems are based on the use of Random Forest and gradient boosting algorithms over decision trees. They account for more than 70% of all the developed models. These algorithms are versatile and often show higher quality; after training, it is possible to determine the importance of each variable (its contribution to the predictive power of the model) [11]. The basic classification algorithm, with which all developed models are always compared, is logistic regression. Its variables are ease of implementation, speed of operation, and interpretability of the results (logistic regression estimates the probability of an event occurring and interprets the results based on the importance of each variable). During the analysis, the CatBoost implementation was chosen because it provides a convenient interface when working with categorical variables (to speed up the work, one of two analogs can be used: XGBoost or LightGBM; all these implementations show comparable results) [12].
One of the most powerful tools for ML is an artificial neural network [14]. However, despite the superior accuracy of the prediction in some cases and the universality of its use on various data, there are two significant drawbacks: the inability to analyze the operation of the algorithm and the resource intensity of the learning process [15]. Based on the size of the training dataset and the type of data collected, the use of neural networks is inappropriate. Therefore, the gradient boosting method over decision trees, represented by the XGBoost library, was chosen.
CatBoost is based on a gradient-boosting algorithm for decision trees (DTs). Gradient boosting is an ML technique for classification and regression problems that builds a prediction model in the form of an ensemble of weak predictive models, typically decision trees. The ensemble was trained sequentially; at each iteration, the deviations of the predictions of the already trained ensemble were calculated on the training set [16]. The next model to be added to the ensemble predicts these deviations. Thus, by adding the predictions of the new tree to those of the trained ensemble, it is possible to reduce the average deviation of the model, which is the target of the optimization problem. New trees are added to the ensemble until the error decreases or until one of the “early stopping” rules is satisfied. Most often, in practice, CatBoost shows a higher quality; however, this algorithm has many parameters, and finding the optimal characteristics may take some time.
Another method analyzed in this study, Random forest, uses an ensemble of decision trees created on a randomly partitioned dataset [17]. A set of such classifier trees forms the forest. Each individual decision tree is generated using variable selection metrics such as the information gain criterion, gain ratio, and Gini index for each variable. A tree was created based on an independent random sample. In a classification problem, each tree vote and the most popular class are selected as the final results [18].
The proposed method has several advantages.
- It has high prediction accuracy, which is comparable to the results of gradient boosting.
- It does not require careful tuning.
- It is insensitive to outliers in data owing to random sampling.
- It is insensitive to scaling and other monotonic transformations of variable values.
- Rarely overtrained. In practice, adding trees only improves composition.
- It can efficiently handle data containing a large number of variables and classes.
- It works well with missing data and maintains good accuracy even with missing data.
- Continuous and discrete variables are handled equally well.
- Highly parallelizable and scalable.
To build this algorithm, we used an implementation of the model for Python from the scikit-learn library at https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
However, among the drawbacks of the model, a potential disadvantage is that for categorical variables with a large number of values, the method tends to consider such variables as more important. The partial mixing of values in this case can reduce the influence of this effect [10]. From groups of correlating parameters, the importance of which turns out to be the same, smaller groups were selected. In addition, Random Forest is quite slow because the algorithm uses many trees to work: each tree in the forest is given the same input data, based on which it must return its prediction. Subsequently, voting also occurs on the received forecasts. The Random Forest model is more difficult to interpret than a decision tree, where one can easily determine the outcome by following a path in the tree [19]. The CatBoost algorithm does not have these limitations and can be used for various classification problems and working with big data. It is worth noting that the results showed that the maximum values of precision and recall were obtained specifically for this model. The algorithm built using CatBoost showed that the presence/absence of a history of pregnancy/number of miscarriages/births, as well as the quality of the embryo and the number of arrested embryos, have the maximum impact on the effectiveness of the IVF program. In addition, the pregnancy rate is influenced by the final maturation trigger, duration of stimulation, number of blastocysts of good and excellent quality, spermogram indicators on the day of puncture, and number of MII oocytes.
In research in the field of reproductive medicine, the CatBoost algorithm has proven to be a highly effective and convenient method for predicting outcomes and optimizing treatment. In the study by Tikhaeva K. et al., the predicted response to ovarian stimulation in ART protocols was analyzed using gradient boosting, linear regression, decision tree algorithm, and Random Forest. The most accurate model was obtained using gradient boosting and Random Forest. The model included clinical and anamnestic data of patients, indicators of ovarian reserve, stimulated cycle parameters, and number of oocytes obtained in previous ART programs. The algorithm predicted the response to ovarian stimulation with an accuracy of 82.3% [20].
One of the largest studies regarding predicting the effectiveness of an ART program using various ML methods was published in 2022. The purpose of that study was to determine the most accurate ML model for predicting the effectiveness of ART, as well as to determine the most significant factors influencing the prognosis. The study built a logistic regression model, decision tree, naive Bayes classifier, Random Forest, support vector machine, artificial neural networks, and gradient boosting over decision trees. The results were assessed using performance indicators (F1-score, specificity, accuracy, and area under the curve). The most accurate models for predicting implantation rates in the ART program were those obtained using the Random Forest algorithm and gradient boosting over decision trees (super-learners). In addition, the results showed that the maximum impact on the prediction of the outcome of the ART program was based on the age of the mother, day of embryo transfer, total dose of gonadotropins, and concentration of estradiol on the day of the appointment of the trigger for final oocyte maturation [21].
The basic classification algorithm, with which all developed models are always compared, is logistic regression. Its features include ease of implementation, speed of operation, and interpretability of results. In a study by Hansen K.R. et al., logistic regression was used to analyze implantation rates, clinical pregnancy rates, and live birth rates in 900 couples with unexplained infertility. The AUC values of the model for predicting implantation, clinical pregnancy, and live birth rates were 0.66, 0.64, and 0.65, respectively [22]. Meijerink A.M. et al. developed a multivariate logistic regression-based prediction model using a dataset obtained from 289 couples following TESE. The area under the curve of the logistic regression prediction model was 0.67 [23]. Thus, the results of the most significant and large studies examining the accuracy of logistic regression showed that the efficiency of the model did not exceed an average of 0.65, despite the representative sample of patients.
Compared to logistic regression, ML algorithms are more sensitive and more accurate methods for which the limitations of traditional regression are less applicable. In reproductive medicine, ML algorithms have also been used in several studies. Blank C. et al. retrospectively included data collected from 1052 couples following IVF/ICSI and embryo transfer. The dataset contained 32 variables, including continuous variables (male and female age, AMH level, categorical variables (stimulation protocols), and discrete variables (oocyte number]). For predicting implantation after blastocyst transfer, the random forest algorithm showed better predictive performance than logistic regression in terms of AUC (0.74, random forest and 0.66 for logistic regression, respectively) [24]. In the study by Qiu J. et al., variables such as age, AMH, duration of infertility, body mass index, number of previous births, miscarriages, abortions, and infertility factors in 7188 women after the first IVF program and embryo transfer. Among the factors influencing pregnancy rate, the maximum value in the final prediction was the presence/absence of a history of pregnancies/childbirth/miscarriages, which is consistent with the results obtained in this study [25]. According to the results of this study, one of the most important predictors of pregnancy rate was the presence/absence of pregnancy/miscarriage/childbirth in anamnesis. These findings make a significant contribution to further research on the impact of various predictors on the effectiveness of ART programs. The presence of primary infertility, as a factor that negatively affects pregnancy in the ART program, reflects more complex molecular biological markers of endometrial receptivity and susceptibility as well as the implantation potential of the resulting embryo, which requires further study. The study also compared the performance of machine learning models, including support vector machine, Random Forest, and gradient boosting (XGBoost), which significantly outperformed traditional logistic regression in personalized pregnancy rate prediction. Among the ML methods, the boosting and random forest algorithms were the most accurate, predicting pregnancy with 73% accuracy [25].
It is noteworthy that among the factors that contribute the maximum value to the final prognosis, among all sperm parameters on the day of puncture, the most important was the concentration of sperm in 1 ml, which confirms the need to optimize the selection of the highest quality sperm for fertilization. The high concentration of sperm in the ejaculate gives the embryologist the opportunity to select the highest-quality sperm. In a study by Rodrigo et al., the only factor used to assess the quality of the ejaculate, which contributed the maximum value to the chromosomal status of the embryo, was sperm concentration [26]. In a study by Harris et al., fertilization rates were found to be most affected by sperm concentration and motility, which further confirms the need for additional optimization of the algorithm for preparing men for ART and assessing the effectiveness of the treatment [27].
The most important indicator of the incidence of clinical pregnancy was the number of arrested embryos. The data obtained confirm that it is the embryo that contributes to a greater extent to the implantation of the embryo into the endometrium and that the quality of the embryo depends on the initial quality of the gametes. It is likely that the embryo quality is correlated with the number of arrested embryos. This marker can be used as an auxiliary marker to assess the chromosome set of the resulting embryo, as the more embryos that stop developing in patients in the ART program, the less likely it is that a pregnancy will occur despite the presence of an embryo of good or excellent quality. McCoy R.C. et al. showed that in embryos with arrested development, arrest occurs due to meiotic and mitotic cell division disorders, and embryos suitable for transfer may also carry an abnormal set of chromosomes [28].
Thus, the results of this study reflect the need for further investigation into the influence of certain factors on pregnancy rates. As part of this study, more than 500 samples of follicular fluid, seminal plasma, sperm, and embryo culture medium were collected from patients undergoing ART infertility treatment according to an integrated biological sample storage system for a comprehensive assessment of the effectiveness of treatment and optimization of the selection of the highest-quality embryo for transfer [29]. The developed software product will make it possible to increase the pregnancy rate in the future by optimizing the modifiable factors and selecting the most promising embryo for transfer using metabolomic profiling of various biological samples, as well as by identifying the most promising group of patients for more effective clinical and economic budget allocation.
Conclusion
Despite the significant advances in the development of systems based on regression analysis, their predictive accuracy remains limited. Therefore, to improve the quality of the model, better mathematical models with an integral approach to the problem are needed, as well as additional markers to improve the diagnostic accuracy. Such markers include various innovative methods for assessing the quality of embryos and reproductive gametes, complementing the prognostic value of the couple's medical history, parameters of the stimulated cycle, embryological stage, and standard methods for assessing the quality of oocytes and ejaculates. Building a model that includes not only the couple's medical history data, but also molecular markers, using complex mathematical systems, will allow not only the determination of the most promising groups of patients for IVF, but also to optimize treatment in these groups, as well as to increase the effectiveness of ART by selecting the highest-quality embryo for transfer. It should be emphasized that at the software product creation stage, the mandatory participation of bioinformaticians and mathematicians specializing in ML and AI is necessary. However, in the future, the finished program can be used by any doctor without special training, owing to its simple and intuitive interface.
References
- Гусев А.В., Новицкий Р.Э., Ившин А.А., Алексеев А.А. Машинное обучение на лабораторных данных для прогнозирования заболеваний. ФАРМАКОЭКОНОМИКА. Современная фармакоэкономика и фармакоэпидемиология. 2021; 14(4): 581-92. [Gusev A.V., Novitskii R.E., Ivshin A.A., Alekseev A.A. Machine learning based on laboratory data for disease prediction. FARMAKOEKONOMIKA. Modern Pharmacoeconomics and Pharmacoepidemiology. 2021; 14(4): 581-92. (in Russian)]. https://dx.doi.org/10.17749/2070-4909/farmakoekonomika.2021.115.
- Драпкина Ю.С., Калинина Е.А., Макарова Н.П., Мильчаков К.С., Франкевич В.Е. Искусственный интеллект в репродуктивной медицине: этические и клинические аспекты. Акушерство и гинекология. 2022; 11: 37-44. [Drapkina Yu.S., Kalinina E.A., Makarova N.P., Mil’chakov K.S., Frankevich V.E. Artificial intelligence in reproductive medicine: ethical and clinical aspects. Obstetrics and Gynecology. 2022; (11): 37-44 (in Russian)]. https://dx.doi.org/10.18565/aig.2022.11.37-44.
- Сахибгареева М.В., Заозерский А.Ю. Разработка системы прогнозирования диагнозов заболеваний на основе искусственного интеллекта. Вестник РГМУ. 2017; 6: 42-6. [Sakhibgareeva M.V., Zaozerskii A.Yu. Developing an artificial intelligence-based system for medical prediction. Vestnik RGMU. 2017; (6): 42-6. (in Russian)].
- Ившин А.А., Багаудин Т.З., Гусев А.В. Искусственный интеллект на страже репродуктивного здоровья. Акушерство и гинекология. 2021; 5: 17-24. [Ivshin A.A., Bagaudin T.Z., Gusev A.V. Artificial intelligence on guard of reproductive health. Obstetrics and Gynecology. 2021; (5): 17-24. (in Russian)]. https://dx.doi.org/10.18565/aig.2021.5.17-24.
- Шелякин В.А., Губарева И.Д., Кокшарова Н.Г. О методике расчета дифференцированных подушевых нормативов финансового обеспечения обязательного медицинского страхования. Бюллетень Национального научно-исследовательского института общественного здоровья имени Н.А. Семашко. 2012; 5: 152-6. [Shelyakin V.A., Gubareva I.D., Koksharova N.G. On the method of calculating differentiated per capita standards for financial support of compulsory health insurance. Bulletin of the National Public Health Research Institute named after N.A. Semashko. 2012; (5): 152-6. (in Russian)].
- Raef B., Ferdousi R. A review of machine learning approaches in assisted reproductive technologies. Acta Inform. Med. 2019; 27(3): 205-11. https://dx.doi.org/10.5455/aim.2019.27.205-211.
- Elhazmi A., Al-Omari A., Sallam H., Mufti H.N., Rabie A.A., Alshahrani M. et al. Machine learning decision tree algorithm role for predicting mortality in critically ill adult COVID-19 patients admitted to the ICU. J. Infect. Public Health. 2022; 15(7): 826-34. https://dx.doi.org/10.1016/j.jiph.2022.06.008.
- Barberis E., Khoso S., Sica A., Falasca M., Gennari A., Dondero F. et al. Precision medicine approaches with metabolomics and artificial intelligence. Int. J. Mol. Sci. 2022; 23(19): 11269. https://dx.doi.org/10.3390/ijms231911269.
- Драпкина Ю.С., Макарова Н.П., Татаурова П.Д., Калинина Е.A. Поддержка врачебных решений с помощью глубокого машинного обучения при лечении бесплодия методами вспомогательных репродуктивных технологий. Медицинский Совет. 2023; 15: 27-37. [Drapkina Yu.S., Makarova N.Р., Tataurova P.D., Kalinina E.A. Deep machine learning applied to support clinical decision-making in the treatment of infertility using assisted reproductive technologies. Medical Council. 2023; (15): 27-37. (in Russian)]. https://dx.doi.org/10.21518/ms2023-368.
- Salditt M., Humberg S., Nestler S. Gradient tree boosting for hierarchical data. Multivariate Behav. Res. 2023; 58(5): 911-37. https://dx.doi.org/10.1080/00273171.2022.2146638.
- Wang C.W., Kuo C.Y., Chen C.H., Hsieh Y.H., Su E.C. Predicting clinical pregnancy using clinical features and machine learning algorithms in in vitro fertilization. PLOS One. 2022; 17(6): e0267554. https://dx.doi.org/10.1371/journal.pone.0267554.
- Al-Shehari T., Alsowail R.A. An insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques. Entropy (Basel). 2021; 23(10): 1258. https://dx.doi.org/10.3390/e23101258.
- Акжолов Р.К. Машинное обучение. Вестник науки. 2019; 3(6): 348-51. [Akzholov R.K. Machine learning. Vestnik Nauki. 2019; 3(6): 348-51. (in Russian)].
- Hancock J.T., Khoshgoftaar T.M. CatBoost for big data: an interdisciplinary review. J. Big Data. 2020; 7(1): 94. https://dx.doi.org/10.1186/s40537-020-00369-8.
- Бучацкая В.В., Бучацкий П.Ю., Лобанов В.Е. Анализ алгоритмов прогнозирования. Вестник Адыгейского государственного университета. Серия 4: Естественно-математические и технические науки. 2020; 4: 49-52. [Buchatskaya V.V., Buchatskii P.Yu., Lobanov V.E. Analysis of forecasting algorithms. Bulletin of Adygea State University. Series 4: Natural-mathematical and technical sciences. 2020; (4): 49-52. (in Russian)].
- XGBoost. Университет ИТМО. Электронные текстовые данные. Доступно по: https://neerc.ifmo.ru/wiki/index.php?title=XGBoost [Дата обращения: 24.10.2021]. [XGBoost. Universitet ITMO. Elektronnye tekstovye dannye. Available at: https://neerc.ifmo.ru/wiki/index.php?title=XGBoost [Accessed 24.10.2021]. (in Russian)].
- Ghosh D., Cabrera J. Enriched random forest for high dimensional genomic data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022; 19(5): 2817-28. https://dx.doi.org/10.1109/TCBB.2021.3089417.
- Hu J., Szymczak S. A review on longitudinal data analysis with random forest. Brief. Bioinform. 2023; 24(2): bbad002. https://dx.doi.org/10.1093/bib/bbad002.
- Ganaie M.A., Tanveer M., Suganthan P.N., Snasel V. Oblique and rotation double random forest. Neural Netw. 2022; 153: 496-517. https://dx.doi.org/10.1016/j.neunet.2022.06.012.
- Tikhaeva K., Nesterova N., Tomilov E., Sotkin S., Muhina A., Zavyalov P., Rosyuk E. Gradient boosting predictive model of ovarian response for hormonal therapy in infertility treatment. Russian Advances in Fuzzy Systems and Soft Computing: Selected Contributions to the 10th International Conference «Integrated Models and Soft Computing in Artificial Intelligence» (IMSC-2021), May 17-20, 2021, Kolomna, Russian Federation.
- Yiğit P., Bener A., Karabulut S. Comparison of machine learning classification techniques to predict implantation success in an IVF treatment cycle. Reprod. Biomed. Online. 2022; 45(5): 923-34. https://dx.doi.org/10.1016/j.rbmo.2022.06.022.
- Hansen K.R., He A.L., Styer A.K., Wild R.A., Butts S., Engmann L. et al. Predictors of pregnancy and live-birth in couples with unexplained infertility after ovarian stimulation-intrauterine insemination. Fertil. Steril. 2016; 105(6): 1575-83.e2. https://dx.doi.org/10.1016/j.fertnstert.2016.02.020.
- Meijerink A.M., Cissen M., Mochtar M.H., Fleischer K., Thoonen I., de Melker A.A. et al. Prediction model for live birth in ICSI using testicular extracted sperm. Hum. Reprod. 2016; 31(9): 1942-51. https://dx.doi. org/10.1093/humrep/dew146.
- Blank C., Wildeboer R.R., DeCroo I., Tilleman K., Weyers B., de Sutter P. et al. Prediction of implantation after blastocyst transfer in in vitro fertilization: a machine-learning perspective. Fertil. Steril, 2019; 111(2): 318-26. https://dx.doi.org/10.1016/j.fertnstert.2018.10.030.
- Qiu J., Li P., Dong M., Xin X., Tan J. Personalized prediction of live birth prior to the first in vitro fertilization treatment: a machine learning method. J. Transl. Med. 2019; 17(1): 317. https://dx.doi.org/10.1186/s12967-019-2062-5.
- Rodrigo L., Meseguer M., Mateu E., Mercader A., Peinado V., Bori L. et al. Sperm chromosomal abnormalities and their contribution to human embryo aneuploidy. Biol. Reprod. 2019; 101(6): 1091-101. https://dx.doi.org/10.1093/biolre/ioz125.
- Harris A.L., Vanegas J.C., Hariton E., Bortoletto P., Palmor M., Humphries L.A. et al. Semen parameters on the day of oocyte retrieval predict low fertilization during conventional insemination IVF cycles. J. Assist. Reprod. Genet. 2019; 36(2): 291-8. https://dx.doi.org/10.1007/s10815-018-1336-9.
- McCoy R.C., Summers M.C., McCollin A., Ottolini C.S., Ahuja K., Handyside A.H. Meiotic and mitotic aneuploidies drive arrest of in vitro fertilized human preimplantation embryos. Genome Med. 2023; 15(1): 77. https://dx.doi.org/10.1186/s13073-023-01231-1.
- Долудин Ю.В., Драпкина Ю.С., Сазонкина П.О., Киселев А.Р., Горбунов К.С. Виртуальная система хранения биологических образцов и ассоциированных данных. Свидетельство о государственной регистрации программы для ЭВМ. Номер свидетельства: RU 2023610092. Патентное ведомство: Россия. Год публикации: 2023. Номер заявки: 2022686282. Дата регистрации: 19.12.2022. [Doludin Yu.V., Drapkina Yu.S., Sazonkina P.O., Kiselev A.R., Gorbunov K.S. Virtual storage system for biological samples and associated data. Certificate of state registration of a computer program. Certificate number: RU 2023610092. Patent Office: Russia. Year of publication: 2023. Application number: 2022686282. Registration date: 19/12/2022. (in Russian)].
Received 06.12.2023
Accepted 26.02.2024
About the Authors
Yulia S. Drapkina, PhD, Researcher at the Department of IVF named after Prof. B.V. Leonov, Academician V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Ministry of Health of Russia, 117997, Russia, Moscow, Academician Oparin str., 4, yu_drapkina@oparina4.ru,https://orcid.org/0000-0002-0545-1607
Natalya P. Makarova, PhD, Leading Researcher at the Department of IVF named after Prof. B.V. Leonov, Academician V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Ministry of Health of Russia, 117997, Russia, Moscow, Academician Oparin str., 4, np_makarova@oparina4.ru,
https://orcid.org/0000-0003-8922-2878
Robert A. Vasiliev, Head of the Laboratory of Applied Artificial Intelligence Z-union, Vice-President of the Association of Laboratories for the Development of Artificial Intelligence, graduate student at the Moscow Institute of Physics and Technology (MIPT), Master of the Department of Applied Physics and Mathematics of the Moscow Institute of Physics and Technology, Master of Economics, Bachelor’s degree at the Research University «Moscow Institute of Electronic Technology».
Vladislav V. Amelin, Technical Director of the Laboratory of Applied Artificial Intelligence Z-union, expert in machine learning, Master’s degree from Moscow State University (Faculty of Computational Mathematics and Cybernetics, Department of Mathematical Methods), Bachelor’s degree from the National Research University «Moscow Institute of Electronic Technology».
Vladimir E. Frankevich, Dr. Sci. (Physical and Mathematical Sciences), Deputy Director of the Institute of Translational Medicine, Academician V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Ministry of Health of Russia, 117997, Russia, Moscow, Academician Oparin str., 4, v_frankevich@oparina4.ru, https://orcid.org/0000-0002-9780-4579
Elena A. Kalinina, Dr. Med. Sci., Professor, Head of the Department of IVF named after Prof. B.V. Leonov, Academician V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Ministry of Health of Russia, 117997, Russia, Moscow, Academician Oparin str., 4, e_kalinina@oparina4.ru,
https://orcid.org/0000-0002-8922-2878