I am using the same data to do Random Forest Regression in R and Python but I am getting very different R2 values. I understand that hyper parameters might be a reason behind this but I don't think it results in almost halving of R2 scores. I am using the following codes and getting the respective results.
In Python -
X = data.drop(['response'],axis=1)
y = data['response']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.05, random_state = 42)
rdf = RandomForestRegressor(n_estimators=500,oob_score=True)
rdf.fit(X_train, y_train)
print("Random Forest Model Score (on Train)" , ":" , rdf.score(X_train, y_train)*100 , "," ,
"Random Forest Model Score (on Test)" ,":" , rdf.score(X_test, y_test)*100)
y_predicted = rdf.predict(X_train)
y_test_predicted = rdf.predict(X_test)
print("Training RMSE", ":", rmse(y_train, y_predicted),
"Testing RMSE", ":", rmse(y_test, y_test_predicted))
>Random Forest Model Score (on Train) : 92.2312123 , Random Forest Model Score (on Test) : 78.1812321
>Training RMSE : 5.606443558164292e-06 Testing RMSE : 9.59221499904858e-06
In R -
> rows <- sample(0.95*nrow(data))
> train_random <- data[rows,]
> test_random <- data[-rows,]
> rf_model <- randomForest(response ~ . ,
data = train_random,
keep.forest=TRUE,
importance=TRUE
)
> rf_model
Call:
randomForest(formula = response ~ ., data = train_random, keep.forest = TRUE, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 6
Mean of squared residuals: 1.437236e-06
% Var explained: 42.05
> pred_train <- predict(rf_model,train_random)
> pred_test <- predict(rf_model,test_random)
> R2_Score(pred_train, train_random$response)
[1] 0.9014311
> R2_Score(pred_test, test_random$response)
[1] 0.3616823
I understand that the test train split is not resulting in the same splits but why am I getting such distinctly different R2 values and what is the way to carry out the same Random Forest in R. I have tried using the same hyper parameters I am getting from Python but it is not helping me get the same R2 values in R. Can someone please help me?