When I run CV I get test mlogloss about 0.3. When using same parameters to predict on I get only 3.2 (this kaggle score) – which is a bit off I think.
My first idea was that maybe I treated train and test differently. I looked over it many times so I ask you if something with my xgboost is wrong.
Target is column, that contains such values: 0, 1, 2, 3, 4 – each of them I would like to predict (5 classes).
#splitting X_fit, X_eval, y_fit, y_eval= train_test_split( train, target, test_size=0.15, random_state=1 ) #training model clf = xgb.XGBClassifier(max_depth=4, missing=np.NAN, n_estimators=500, learning_rate=0.05, subsample=1, colsample_bytree=0.9, seed=2100,objective= 'multi:softprob') clf.fit(X_fit, y_fit, early_stopping_rounds=35, eval_metric="mlogloss", eval_set=[(X_eval, y_eval)]) #result k = clf.predict_proba(test) #contains 11k rows, 5 columns - seems to be fine.
This is the link to data and competition.
This is the link to my full code.