Non_threshold_preds = clf.decision_function(X)į'roc_auc_score = ' Prob_prediction_class_1 = clf.predict_proba(X) X, y = load_breast_cancer(return_X_y=True)Ĭlf = LogisticRegression(solver="liblinear").fit(X, y) InteractiveShell.ast_node_interactivity = "all" Lets take a look: # do importsįrom sklearn.datasets import load_breast_cancerįrom sklearn.linear_model import LogisticRegressionįrom import InteractiveShell So sounds like supplying end classes might not be the best idea since you would round all these details. Reading about roc auc and visualizing it you can see that the main idea of a ROC curve and area under (AUC) is to characterize the trade-offs of false positive rate vs true positive rate at ALL prediction thresholds. So it looks like roc_auc_score expects only numerical values for y_pred accepting either probability estimates or non-thresholded decision values ( decision functions outputs where sometimes you cant get prob outputs) to calculate your area under the curve / score.Īnd while not stated explicitly one may even assume/say it should handle final class predictions (in numerical form) as inputs as well on top of above. Reading further for your (binary case) here: Looks like not only roc_auc_score doesn't work with non-numerical y_score, but perhaps does so for a good reason, since using anything that's in a numerical rounded / final class (aka final class prediction 1,2,3 etc) form is not right either. My question under all of this: how does roc_auc_score ever deal with binary classifications that are strings? What am I doing wrong? So the first check_array works fine, but the 2nd check_array fails. Hence the exception: 'Active' cannot be converted to float. Now, when check_array finds the dtype of the original array to be 'object' and the dtype passed in is 'numeric', it tries to convert the object values into floats. The default for dtype is "numeric" (a string). This is the only reason it succeeds where the 2nd call fails. Note that the first call passes in dtype=None. Y_score = check_array(y_score, ensure_2d=False) Looking into the roc_auc_score method I see what's happening: It first makes these 2 calls to prepare the input arrays: y_true = check_array(y_true, ensure_2d=False, dtype=None) ValueError: could not convert string to float: 'Active' (I tried dtype as 'str', but it gets treated as object anyway) The dtype for both arrays is ' object', because I load them with a pandas dataframe, and convert columns (Series) with to_numpy. )īoth y_true and y_score are ndarray like trics._ranking.roc_auc_score(y_true, y_score. Slurmstepd: error: slurm_set_addr: Unable to resolve ".I'm using sklearn roc_auc_score to evaluate a model from PubChem where the label is a string 'Active' or 'Inactive' and I keep ending up with a ValueError when it tries to convert the string to a float. Slurmstepd: error: Unable to establish control machine address Slurmstepd: error: slurm_set_addr: Unable to resolve ".internal" Slurmstepd: error: get_addr_info: getaddrinfo() failed: Name or service not known ValueError: could not convert string to float: 'contigLen' Rpkms = vamb.vambtools._load_jgi(file, mincontiglength, refhash)įile "/gpfs/ysm/project/christakis/as4258/conda_envs/vamb/lib/python3.7/site-packages/vamb/vambtools.py", line 598, in _load_jgi Len(tnfs), minalignscore, minid, subprocesses, logfile)įile "/gpfs/ysm/project/christakis/as4258/conda_envs/vamb/lib/python3.7/site-packages/vamb/_main_.py", line 101, in calc_rpkm I am getting this error: Traceback (most recent call last):įile "/gpfs/ysm/project/christakis/as4258/conda_envs/vamb/bin/vamb", line 8, in įile "/gpfs/ysm/project/christakis/as4258/conda_envs/vamb/lib/python3.7/site-packages/vamb/_main_.py", line 530, in mainįile "/gpfs/ysm/project/christakis/as4258/conda_envs/vamb/lib/python3.7/site-packages/vamb/_main_.py", line 249, in run I was following the snakemake workflow but each step separately due to the time limit on the server.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |