Featured Application
Machine learning model applications provide enough evidence to predict fish stock using aquaculture data, allowing for significant practical implications. By applying advanced statistical models, including Random Forest, Decision Tree, and Linear Regression, aquaculture stakeholders can improve their decision-making processes by predicting fish populations, mortality rates, and other critical metrics. Among the predictive models, the Random Forest model outperformed other models, reaching the highest accuracy. This allows more efficient and productive management strategies, supports sustainable development, and helps policymakers in enhancing decision-making to protect marine ecosystems. The dynamic for integrating such tools, such as predictive statistical models into real-time monitoring systems, provides a proactive strategy for aquaculture growth, risk mitigation, and long-term sustainability.
Abstract
The current study evaluates the performance of three machine learning models—Decision Trees, Random Forest, and Linear Regression—applied to aquaculture data to mitigate risks in aquaculture management. The performances of these models are analyzed and properly demonstrated using metrics including the Mean Squared Error (MSE), R-squared (R2), Root Mean Squared Error (RMSE), and Concordance Index (C-index). The Random Forest model achieved the highest prediction accuracy among all machine learning models, followed by Linear Regression and the Decision Trees. The scatter plot for Linear Regression demonstrates good predictive accuracy for mid-range values. However, it shows significant deviations at the extremes, indicating that the model struggles to capture the full range of variability in the data. The bar chart of coefficients pinpoints the variables with the greatest impact on the predictions, providing suggestions for potential areas that can be improved and providing model interpretability. Future work could incorporate more predictive statistics models focusing on improving the models for extreme values by assessing non-linear models, feature engineering methods, and expanding research into less influential variables. The results greatly impact several sections, including aquaculture management, policy-making, and operational strategies, providing valuable insights for stakeholders and decision-makers. Apache Spark was used for data processing and machine learning model implementation; Apache Cassandra was also used for data storage, ensuring efficient large dataset management and SQL tools for structured data handling; Oracle VM VirtualBox for cross-platform virtualization; and Spark Connector was also used.
Keywords: machine learning; data mining; algorithms assessment; decision trees; random forest; linear regression; Mediterranean-farmed fish; fish mortality; aquaculture performance metrics; Apache Spark
Gkikas MC, Gkikas DC, Vonitsanos G, Theodorou JA, Sioutas S. Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark. Applied Sciences. 2024; 14(22):10112. https://doi.org/10.3390/app142210112