Shabana Ramzan ,Yazeed Yasin Ghadi ,Hanan Aljuaid ,Aqsa Mahmood,⋆ and Basharat Ali
1Department of Computer Science&IT,Government Sadiq College Women University,Bahawalpur,63100,Pakistan
2Department of Computer Science/Software Engineering,Al Ain University,Al Ain,64141,UAE
3Department of Computer Sciences,College of Computer and Information Sciences,Princess Nourah bint Abdulrahman University(PNU),Riyadh,11671,Saudi Arabia
4Agronomic Research Station,Bahawalpur,63100,Pakistan
ABSTRACT Traditional farming procedures are time-consuming and expensive as based on manual labor.Farmers have no proper knowledge to select which crop is suitable to grow according to the environmental factors and soil characteristics.This is the main reason for the low yield of crops and the economic crisis in the agricultural sector of the different countries.The use of modern technologies such as the Internet of Things(IoT),machine learning,and ensemble learning can facilitate farmers to observe different factors such as soil electrical conductivity(EC),and environmental factors like temperature to improve crop yield.These parameters play a vital role in suggesting a suitable crop to cope the food scarcity.This paper proposes a system comprised of two modules,first module uses static data and the second module takes hybrid data collection(IoT-based real-time data and manual data)with machine learning and ensemble learning algorithms to suggest the suitable crop in the farm to maximize the yield.Python is used to train the model that predicts the crop.This system proposed an intelligent and low-cost solution for the farmers to process the data and predict the suitable crop.We implemented the proposed system in the field.The efficiency and accuracy of the proposed system are confirmed by the generated results to predict the crop.
KEYWORDS Machine learning;Internet of Things;sensors;ensemble learning
Agriculture or farming as it is commonly known,is the practice of growing crops and raising cattle.When people began growing crops,they also began herding and breeding animals.Many of them are also sources of meat,milk,cheese,and butter.Agriculture provides food as well as raw material for the industry such as wood,cotton,leather,wool,and paper products.Agriculture also helps for producing resources needed for creating commercial products.It contributes greatly to a country’s economy.Conventional farming is mostly practiced all over the world.It involves techniques suggested by experienced farmers.In conventional farming,farmers need manual labor to handle crops and livestock,often leading to inefficient resource use.All over the world,there is a desperate need to increase the productivity of agriculture products due to the following reasons:
• Demand for agricultural products is rising exponentially with the increase of the human population.According to a study[1],the global population is expected to rise from 1.8 billion in 2009 to 4.9 billion in 2030,leading to a drastic rise in demand for dairy products.
• Low yield of crops due to climatic changes could lead to food scarcity.In these circumstances,surplus food can be used to accommodate the needs of humans,and animals as well as trade purposes in exchange for other goods.
• Farmers have economic concerns about their agricultural products,so they want maximum crop yield.
Farmers need to learn the techniques or methods to increase the crop yield and to maximize the income.They must have proper information such as temperature,humidity,soil PH,etc.,about all the factors that are involved in making a decision on suitable crop selection for their field[2–5].The factors that are crucial for crop growth after sowing seeds should also be known and considered by the farmers.The health of the crop,higher yield,and maximum income can be guaranteed by the consideration of all relevant parameters.The agriculture sector is suffering because of the inability of useful farming decisions and insufficient resources.Smart farming can manage this downside of conventional farming by engaging modern technologies such as machine learning algorithms[6]and IoT for its appropriate replacement.Recent trends such as IoT[7],data science and big data analytics[8–10]have a great impact on people’s daily lives and other fields such as health and agriculture.The IoT-based agriculture system ensures the continuous monitoring of the field and these systems can enhance the crop yield[1].
Farmers being the main stakeholders do not have the proper information to make the best suitable farming decision.An automated decision-making method is required to choose suitable crops according to the soil conditions and climatic changes.These systems complement the knowledge and experiences of farmers;they have been through the generations [11].The performance of a decision support system (DSS) depends upon the accuracy of inputs and the efficiency of machine learning(ML) algorithm for data analysis.Temperature,humidity,soil PH,rainfall,and light are the major factors to be considered to monitor crop growth and improvement in crop yield.The automated monitoring system takes inputs at regular intervals.These automated systems work with less or no human intervention.
Climatic conditions vary due to global warming,particularly affecting water levels and it is difficult to find these variations but IoT is very helpful to measure air humidity,temperature and soil moisture[12].To monitor the growth of crops to increase the yield it is very important to know the soil moisture content,soil PH,temperature and weather conditions.The fuzzy logic-based irrigation system makes a decision using data loggers and sensors[13].The irrigation system uses IoT to sense environmental factors and send data to the proposed system to control the water pump [14].The researchers in[15]studied crop yield estimation under three parameters such as moisture,humidity,and temperature using fuzzy logic.
Machine learning has an incredible role in agricultural applications to deal with the problems of conventional farming.Machine learning-based applications have different phases,such as the acquisition of data,data preprocessing/data preparation,picking the model and training the model,evaluation of the model,tuning of parameters,and finally prediction of dependent attributes.Machine learning may be used to predict different attributes of agriculture parameters such as yield,crop growth,fertility of soil,plant disease,and identification of species.The study [16] used data (temperature and rainfall)and then analyzed the data using different machine learning algorithms which is very useful for countries having low to middle income.
The motivation behind this research is to produce a system that overcomes the conventional agricultural challenges for example high cost,time-consuming,and low yield,which is difficult for the farmers due to poor knowledge of soil,environmental factors,and their effects.Our proposed system inputs the air temperature,air humidity,rainfall,soil PH,soil moisture,nitrogen(N),potassium(K),and phosphorous(P)as static data and real-time data using sensors to predict the suitable crop using ML and EL(Ensemble Learning).ML and EL have an incredible role in agricultural applications to deal with the problems of conventional farming.Fig.1 shows the diagram of the proposed model.The input variables are environmental factors and soil characteristics,and the output variable is the suitable crop for a particular block field.The output is predicted through input variables.
Figure 1: Model of proposed approach
The main contributions of the proposed approach are as follows:
• To design a crop prediction system using static data and sensor’s input data with ML and EL.
• To use static data as well as collect the data using various sensors such as air temperature,and electrical conductivity.
• To have the best idea about the dataset,descriptive analysis was performed.
• To analyze the dataset,data visualization is performed.
• To decide the suitable crop for the field uses an intelligent approach based on ML and EL algorithms by considering factors such as soil characteristics and environmental factors.
• To evaluate the models by measuring accuracy,precision,recall,and F1-score.
There are the following sections in this manuscript: Section 2 discusses the related works on IoT-based,ML,and EL-based systems for smart farming.Section 3 explains the details of the proposed methodology including proposed architecture,techniques of data analysis,and used hardware.Section 4 presents the results and evaluation of the proposed approach and finally,Section 5 presents the conclusion and future work.
The agriculture sector demands a smart farming system to increase the yield of agricultural products by automatically monitoring and controlling environmental factors and soil characteristics.In this domain,researchers have proposed different systems for open-field farming.The existing systems considered various factors that are relevant to smart farming such as air temperature,air humidity,and soil moisture effects on crop health and crop yield.Artificial intelligence(AI)and IoT are technologies that can play a vital role in building sensor-based smart farming systems that can predict the suitable crop for a particular field area.The authors highlighted the leadership role of artificial intelligence in current as well as future endeavors[17].The weather sensor-based smart system that monitored the health status of vineyards to improve the quality of wine[18].It involved various operations:data analysis was performed remotely,vegetation analysis was done by video processing,data was exchanged wirelessly and finally monitoring evaluation of data.This system used various subsystems and it is tested in real scenarios by using two pilot sites.The AI also suffered several issues,the methodology presented is based on artificial intelligence to identify and counter cyber threats/risks[19].
To increase the yield of the crop,an agriculture system is required that can monitor the plant growth to adjust the supply of water,the use of fertilizer according to the need of the crop and control the environmental factors.Leaf health monitoring can be very useful in predicting the maturity and growth of leafy crops.The smart camera-based system is proposed that takes the leaf dimensions by using ultrasonic sensors to analyze the age and maturity of crops to predict leaf growth [20].The system is implemented and tested on the available dataset of tomato leaf growth.The results are lacking to show the longer times growth phase due to the unavailability of the crop dataset.The IoTbased protocol is designed to monitor the farm remotely and make a decision accordingly for smart agriculture applications[21].The energy consumption,latency,and delay of network communication are controlled using routing and clustering algorithms.K-means algorithm used for the formation of clusters.
Smart agriculture systems can be more advanced using image processing techniques,the cameras are installed in crop fields to take images of the crop field,and then image processing techniques are used to classify the images,extract the features and patterns as per requirement.The research proposed a system that combined the IoT and image processing to get real-time data using sensors and cameras and then analyzed the data to make decisions and communicate with farmers[22].The CC3200 chip-based system is proposed that has a network processor,Wi-Fi unit,and microcontroller that provides accurate information such as humidity and temperature of air to the farmer [23].The IoT-based system PMAS collects various kind of data from multiple sources that decides the process of agriculture production[24].
The soil deterioration reduced the economic benefits of agricultural products,as it forced the farmers to use more water,fertilizer,and nutrients.The causes arise due to farmer’s unawareness,they do not make a selection of crops according to field conditions,environmental factors,and soil characteristics.So there is a desperate need to address the issues of farmers that result the crop mismanagement and nutrition extraction.The equation is proposed to control the erosion of wind and degradation of soil[25].The research proposed an AI-based technique that used soft sensors and deep learning [26].The input pre-processing is performed to remove noise and clean the data,the neural network is used for feature representation and finally,classification is done using auto encoder and kernel-based convolution network architecture.
The IoT-based approach is proposed to predict the sugar yield from sugarcane.The process output in the prediction model is investigated using a machine learnin based approach.All the methods were assessed and it was found that the proposed method is the best and most efficient [27].The authors predicted the collapsibility of loessial soils using machine learning algorithms[28].The smart agriculture system predicts the crop yield and also suggests the crop based on previous crop sowing in the farmland to utilize the soil nutrients according to the need of the crop[29].
Ensemble learning model using different machine learning models as base-level models to predict the leaf area index of summer maize under various fertilizer and water conditions[30].The stacking technique of ensemble learning is used.The system is evaluated using different metrics such as Root Mean Square Error (MSE) and R2.The research for the analysis of user behavior and tracking the particular audience to show their content using ensemble learning [31].This system improved behavioral analytics by classifying users’retail behavior.Table 1 shows the comparison of the proposed study with previous smart farming approaches.
Table 1: Comparative study of smart farming approaches
• Existing studies used just a few algorithms for prediction but our system used 14 algorithms.
• Existing studies used only ML or EL for prediction but our system used both types of algorithms.
• Intelligent decision-making approaches were not used advanced algorithms in existing studies,we have used advanced algorithms such as LightGBMClassifier,Gradient Boosting,Voting Classifier with Hard Voting,Voting Classifier with Soft Voting,AdaBoostClassifier,Stochastic Gradient Boosting Classifier,CatBoostClassifier,ExtraTreesClassifier.
• Existing Studies predicted based on sensor data only or static data but our system provides the flexibility as it includes both modules.
• For water requirement,existing studies considered only rainfall underground water,or surface water but our system considered all the water as a water requirement parameter.
The system used a static dataset for module-1 and hybrid data collection for module-2 using various sensors to collect real-time data and manual data in different regions of Punjab,Pakistan.Fig.2 shows the architecture of the proposed methodology.
Figure 2: Architecture of proposed methodology
3.1 Module-1
The IoT-based intelligent crop recommendation system used the dataset to take the input such as air temperature,air humidity,rainfall,soil PH,N,P,K and predict the suitable crop for the field by using the 14 ML and EL algorithms such as Gaussian Naïve Bayes(GNB),K-Nearest Neighbor(KNN),Decision Tree (DT),Random Forest (RF),Logistic Regression (LR),LightGBMClassifier(LGBM),Support Vector Classifier (SVC),Gradient Boosting (GB),Voting Classifier with Hard Voting(HV),Voting Classifier with Soft Voting(SV),AdaBoostClassifier(AD),Stochastic Gradient Boosting Classifier(SGB),CatBoostClassifier(CB),ExtraTreesClassifier(ET).
3.1.1 Working Algorithm
The following working algorithm(Algorithm 1)used the input data and performed all the steps to get the required output,the proposed approach has used machine learning algorithms for prediction.
3.1.2 Dataset Used
We used the kaggle crop recommendation dataset [32] for the proposed approach.The dataset has the following attributes such as contents of N,K,and P in soil,temperature,humidity,soil pH,and rainfall,Fig.3 shows the scattered plot between temperature and humidity for data distribution.The crops are rice,maize,jute,cotton,banana,orange coconut,apple,papaya,watermelon,grapes,mango,pomegranate,lentil,black gram,mungbean,mothbean,kidney bean,pigeon peas,chickpea,muskmelon and coffee,twenty-two classes are in the dataset and each class has one hundred (100)samples.
3.1.3 Dataset Plotting for Data Visualization
To find the best feature to build a prediction model,heatmap is primarily used to find the feature among the dataset that is more important.The correlation matrix shows the variable intensity and density.The potassium levels and phosphorous levels are highly correlated,as shown in Fig.4.The dark below shows the high values and the yellow shows the low values.Fig.5 shows the values are concentrated around humidity.
3.1.4 Decision Making
To decide on the suitable crop for the field area,our approach is using the KNN machine learning algorithm for crop prediction using dataset attributes,as shown in Fig.6.We have used Python programming language to implement the KNN algorithm by importing the numpy,matplotlib,seaborn,openpyxl and pandas libraries.The details of the steps to perform decision-making are given below.
Figure 3: Dataset distribution
Figure 4: Correlation between features
Figure 5: Data concentration around humidity
Figure 6: Dataset analysis for decision making
3.1.5 Data Pre-Processing
Data pre-processing is an important step in preparing the data for decision-making.We have checked our dataset for missing and null values,there are no null values and no missing values.We have performed data normalization.There are various techniques available for normalization,we used MinMaxScalar for feature scaling between 0 and 1.The Python interpreter uses the following formula for Min-Max scaling.
where
BSC:is the resulting value
Bmin:is the minimum value
Bmax:is the maximum value of the given dataset.
3.1.6 Data Splitting
Firstly,we split the dataset into two sets,namely the training dataset and the test set.The training dataset is the subset of the dataset used to train the model and to make predictions accurately.The testing set is the subset of the dataset that is used to validate and evaluate the model.The training and testing set is split in 80:20 ratios.
3.1.7 Build Model
After splitting the data,the model is built using Python libraries and fits the training dataset into the model.The model performed predictions of suitable crops according to the independent features of the dataset.In our dataset we have used seven independent variables(N,P,K,temperature,humidity,soil pH and rainfall)and the dependent variable is crop.
3.2 Module-2
In this module,we collected the input such as air temperature,water requirement(rainfall,surface water and ground-water),and soil electrical conductivity to predict the suitable crop for the field by using the ML and EL algorithms.For water requirement,we have taken the union of rainfall,surface water,and ground-water.We collected temperature and EC using sensors while rainfall,surface water and ground-water were taken from the farmer for six crops such as wheat,Rice,Cotton,Maize,Gram,and Groundnut.After the collection of data,we preprocess the data as shown in Fig.7 and split the dataset as a train and test set.
Figure 7: Data preprocessing
The proposed system controls the reduction of crops by selecting suitable crops for the agriculture field.The system is flexible enough that allows the farmers to enter the values manually if known to them or if not known then the farmer has an option to use IoT based module to predict the suitable crop to get maximum yield.The system makes it convenient for the farmer to know the values of soil parameters without going for soil testing and know their standard ranges without contacting human experts.The system makes automatic decisions about the crop by considering the environmental and soil parameters.
To evaluate the performance of our proposed approach,we have calculated the accuracy,precision,recall,and f1-score of the proposed approach.We have evaluated the proposed approach performance with all machine learning algorithms such as KNN,DT,RF,SVM,LR,and GB.We have analyzed all algorithms by performing the same steps that we have performed for the KNN algorithm and compared the results.
4.1 Accuracy
Accuracy is an evaluation metric that measures the correct predictions made by the machine learning model concerning the total no of predictions.The formula used to calculate the accuracy divides the correct prediction number by the total prediction number,as given in Eq.(2).where Tpis the true positive,Tnis the true negative,Fpis the false positive,and Fnis the false negative.
4.2 Precision
To check the correctness of our proposed approach and how well our model predicts the true cases,we used precision.The formula used to calculate the precision divides the true positives by the total number of all positive and negative observations as given in Eq.(3).
4.3 Recall
To measure the sensitivity,we calculate the recall by using the formula that divides the positive numbers predicted correctly by the total positive numbers,as given in Eq.(4).
4.4 F1-Score
The F1-score is calculated by using precision and recall,the formula is given in Eq.(5).
Figs.8 and 9 show the confusion matrix for module-1 and module-2,They show the results of accuracy that are calculated by using the formula,given in Eq.(3).The values at the diagonal of the matrix show the number of correct predictions.
Fig.10 shows the classification report,it provides the precision,recall,and F1-score values of the KNN model for module-2.
Figs.11 and 12 show the performance comparison of the ML and EL algorithms for module-1 and module-2,respectively.
Figure 8: Confusion matrix for module-1
Figure 9: Confusion matrix for module-2
Figure 10: Classification report of KNN algorithm for module-2
Figure 11: Accuracy comparisons of the algorithms for module-1
Figure 12: Accuracy comparisons of the algorithms for module-2
Farmers use traditional methods that are time-consuming and expensive,expensive as well as they also suffer economically due to the involvement of manual labor and incorrect selection of crops without knowing the environmental factors,soil characteristics,and nutrition.Our proposed approach is really helpful for the farmers to get maximum yield by making the correct selection of crops for the field.our approach address this issue with ML and EL algorithm,module-1 uses static data input of twenty-two crops,perform data analysis,and predict the suitable crop,and module-2 uses sensorbased and manual data collection of six crops for prediction.The experimental results show that the system is intelligent enough to predict the suitable crop and plays a vital role in increasing the earnings of farmers by maximizing the yield of crops as well as making the agriculture profession competitive.Our system is very cost-effective as all the hardware is available at low cost.Our study collected the data from sensors and farmers but did not use social networking and business data that can also be helpful to further increase the yield at a low cost.The proposed approach considered the environmental factors but did not consider the data according to the weather of any particular zone.The main concern in data collection is the quality of data,the complex data set may make it difficult to execute smart farming procedures.The ML does not require to be explicitly programmed but on the other hand,there is a problem of interpretability.We have used advanced machine learning algorithms which are difficult to interpret but outperform.Overfitting and underfitting of trained model can affect the model accuracy and as a result system performance can be affected,our proposed system addresses this issue and tries at best to minimize it.
Although our proposed system is beneficial for the farmers it is difficult for the farmers to adapt to new farming environments where they have no role in decision making.The economic benefit of the farmer is the big motivation to adapt our proposed system.There is a need to train and educate the farmers to understand the use of the system to get its maximum benefit.
In the future,more algorithms can be used such as neural networks and we can use real-time datasets with more features and more crops.We can also develop more modules such as crop growth monitoring with sensor cameras and crop plant disease detection.
Acknowledgement:None.
Funding Statement:This research was supported by the Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2024R54),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
Author Contributions:Conceptualization:S.Ramzan,A.Mahmood,B.Ali;methodology:S.Ramzan,A.Mahmood;software:S.Ramzan,Y.Ghadi;validation:H.Aljuaid,Y.Ghadi and S.Ramzan;formal analysis:A.Mahmood,Y.Ghadi;investigation:B.Ali and S.Ramzan.;draft manuscript preparation:S.Ramzan,A.Mahmood and H.Aljuaid;visualization:H.Aljuaid,Y.Ghadi.All authors reviewed and agreed to the final version of the manuscript.
Availability of Data and Materials:A dataset used in module-1 can be found in https://www.kaggle.com/datasets/atharvaingle/crop-recommendation-dataset and the dataset used in module-2 are available upon reasonable request.
Conflicts of Interest:The authors declared no conflicts of interests to report regarding the present study.