我正在尝试按照本教程中的说明运行数据科学实验。本教程有 5 节课,每节课都有几个小节。
对于本教程,我正在使用
- Azure 上的 SQL Server 2016 RC3 虚拟机,启用了 R 服务。
- RRE for Windows 8.0.0 作为数据科学客户端/R 客户端(用于远程连接到 SQL Server)
- 创建了一个新的 SQL 登录名,具有对数据库的读、写和 ddl 访问权限 - 用于通过 R 客户端连接到 SQL Server。
我已成功完成第 1 课,即从我的 R 客户端创建 SQL Server 数据对象,查询和修改 SQL Server 数据,以及定义/设置计算上下文。
我在本教程第 2 课的开头遇到了一个错误。
一旦我将计算上下文从本地更改为 sql server,一个简单的汇总函数 (rxsummary) 就会抛出错误。
错误如下所示:
C:\Users\...\Project0\DeepDive Experiment.R(109): Error in try({ : ODBC statement error: [Microsoft][ODBC SQL Server Driver][SQL Server]Could not find stored procedure 'master..xp_ScaleR_init_job'. Error in rxInDbJobIdParam(schedulerJobInstance, FALSE) : hpcServerJob object has an invalid id. Ensure it was returned from a prior rxStartClusterJob() call Error in rxStartClusterJob(hpcServerJob, timesIsValidated = TRUE, continueOnFailure = FALSE) : Error in try({ : ODBC statement error: [Microsoft][ODBC SQL Server Driver][SQL Server]Could not find stored procedure 'master..xp_ScaleR_init_job'. Error in rxInDbJobIdParam(schedulerJobInstance, FALSE) : hpcServerJob object has an invalid id. Ensure it was returned from a prior rxStartClusterJob() call
任何帮助
- 为什么会发生此错误?
- 如何查找/检查主数据库中的存储过程 - 如何检查 xp_scaleR_init_job 是否存在?
- 如果不存在,如何添加/创建存储过程?
将不胜感激。
为了方便访问,这里是完整的注释脚本,直到我遇到错误:
###########################################DATA SCIENCE DEEP DIVE TUTORIAL###############################################
##Create the SQL Server Data Objects##
#Provide your database connection string in an R variable.
#DDUser01 is a login created on the sql server instance for remote login.
#It has read, write and ddl access to the DeepDive database.
sqlConnString <- "Driver=SQL Server;Server=*ip address*; Database=DeepDive;Uid=DDUser01;Pwd=*******"
#Specify the name of the table you want to create, and save it in an R variable.
sqlFraudTable <- "ccFraudSmall"
#Chunking
sqlRowsPerRead = 5000
#Define a variable to store the new data source
sqlFraudDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlFraudTable, rowsPerRead = sqlRowsPerRead)
#Create a new R variable, sqlScoreTable, to store the name of the table used for scoring.
sqlScoreTable <- "ccFraudScoreSmall"
#Define a second data source object
sqlScoreDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlScoreTable, rowsPerRead = sqlRowsPerRead)
##Load Data into SQL Tables Using R##
#Create an R variable, and assign to the variable the file path for the CSV file.
ccFraudCsv <- file.path(rxGetOption("sampleDataDir"), "ccFraudSmall.csv")
#RxTextData function to specify the text data source.
inTextData <- RxTextData(file = ccFraudCsv, colClasses = c(
"custID" = "integer", "gender" = "integer", "state" = "integer",
"cardholder" = "integer", "balance" = "integer",
"numTrans" = "integer",
"numIntlTrans" = "integer", "creditLine" = "integer",
"fraudRisk" = "integer"))
#Call rxDataStep to insert the data into the SQL Server table
rxDataStep(inData = inTextData, outFile = sqlFraudDS, overwrite = TRUE)
#Variable for creating a path to the source file - score
ccScoreCsv <- file.path(rxGetOption("sampleDataDir"), "ccFraudScoreSmall.csv")
#RxTextData function to get the data and save it in the variable
inTextData <- RxTextData(file = ccScoreCsv, colClasses = c(
"custID" = "integer", "gender" = "integer", "state" = "integer",
"cardholder" = "integer", "balance" = "integer",
"numTrans" = "integer",
"numIntlTrans" = "integer", "creditLine" = "integer"))
#Call rxDataStep to overwrite the current table with the new schema and data.
rxDataStep(inData = inTextData, sqlScoreDS, overwrite = TRUE)
##Query the Data ##
#Use the function rxGetVarInfo and specify the data source you want to analyze
rxGetVarInfo(data = sqlFraudDS)
##Modify Metadata##
#Mapping of USA State abbreviations (categorical) to their integer identifiers
#Create an R variable that holds the vector of strings to add to it - different states of the USA.
stateAbb <- c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC",
"DE", "FL", "GA", "HI","IA", "ID", "IL", "IN", "KS", "KY", "LA",
"MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NB", "NC", "ND",
"NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI","SC",
"SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY")
#Create a column information object that specifies the mapping of the existing integer values to the categorical levels
#This statement also creates factor variables for gender and cardholder.
ccColInfo <- list(
gender = list(
type = "factor",
levels = c("1", "2"),
newLevels = c("Male", "Female")),
cardholder = list(type = "factor",
levels = c( "1", "2"),
newLevels = c("Principal", "Secondary")),
state = list(type = "factor", levels = as.character(1:51), newLevels = stateAbb)
)
#Update the SQL Server data source that uses the updated data
sqlFraudDS <- RxSqlServerData(connectionString = sqlConnString,
table = sqlFraudTable, colInfo = ccColInfo,
rowsPerRead = sqlRowsPerRead)
#Query new information
rxGetVarInfo(data = sqlFraudDS)
##Create and Set a Compute Context##
#Specify the connection string for the instance where computations will take place.
sqlConnString <- "Driver=SQL Server;Server=*ip address*; Database=DeepDive;Uid=DDUser01;Pwd=*******"
#Specify the location of the shared directory (temp folder for workspace objects) and save it in a variable.
sqlShareDir <- paste("c:AllShare", Sys.getenv("USERNAME"), sep="")
#Create shared directory if it does not exist
if (!file.exists(sqlShareDir)) dir.create(sqlShareDir, recursive = TRUE)
#Specify how you want the output handled.
#Here, you are indicating that the R session on the workstation should always wait for R job results,
#but not return console output from remote computations.
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
#Define the compute context object
sqlCompute <- RxInSqlServer(
connectionString = sqlConnString,
shareDir = sqlShareDir,
wait = sqlWait,
consoleOutput = sqlConsoleOutput)
#ALTERNATIVE:Enable Tracing on the Compute Context
sqlComputeTrace <- RxInSqlServer(
connectionString = sqlConnString,
shareDir = sqlShareDir,
wait = sqlWait,
consoleOutput = sqlConsoleOutput,
traceEnabled = TRUE, traceLevel = 7)
#Change Compute Context to the Server
rxSetComputeContext(sqlCompute)
##Compute Summary Statistics##
#Compute summary statistics for several of the variables
##THIS IS WHERE I FACE THE ERROR##
sumOut <- rxSummary(formula = ~gender + balance + numTrans + numIntlTrans + creditLine, data = sqlFraudDS)
显然,SQL Server 2016 需要一个至少为 8.0.3 的 R 服务器客户端 (RRE)。这个网站谈论它。我也从微软支持那里得到了同样的答案。
我的 R 服务器是 RRE 8.0.0。这可能是我收到错误的原因。当我改为安装 Microsoft R Client 时,脚本有效(禁止 rxCube)!我可以将计算推送到 SQL Server 并完成本教程。
如果您
rxGetOption("demoScriptsDir")
在 R 控制台中输入,您应该会看到示例脚本所在的目录,例如完成脚本
RevoScaleR_SqlServer_GettingStarted.r
,因为它包含所有代码 - 我认为您缺少一些位(如sqlShareDir
位)。我还要说这些脚本并不完美,我遇到了很多问题,有时运行它们两次会有所帮助。