为了避免主 SQL Server 上的负载过大,我想在主服务器的辅助副本上安装并启用 R 服务。这可能吗?
另外,我可以将计算从远程 R 客户端推送到启用了 R 服务的辅助服务器吗?
这对主要有任何影响吗?
为了避免主 SQL Server 上的负载过大,我想在主服务器的辅助副本上安装并启用 R 服务。这可能吗?
另外,我可以将计算从远程 R 客户端推送到启用了 R 服务的辅助服务器吗?
这对主要有任何影响吗?
我正在尝试按照本教程中的说明运行数据科学实验。本教程有 5 节课,每节课都有几个小节。
对于本教程,我正在使用
我已成功完成第 1 课,即从我的 R 客户端创建 SQL Server 数据对象,查询和修改 SQL Server 数据,以及定义/设置计算上下文。
我在本教程第 2 课的开头遇到了一个错误。
一旦我将计算上下文从本地更改为 sql server,一个简单的汇总函数 (rxsummary) 就会抛出错误。
错误如下所示:
C:\Users\...\Project0\DeepDive Experiment.R(109): Error in try({ : ODBC statement error: [Microsoft][ODBC SQL Server Driver][SQL Server]Could not find stored procedure 'master..xp_ScaleR_init_job'. Error in rxInDbJobIdParam(schedulerJobInstance, FALSE) : hpcServerJob object has an invalid id. Ensure it was returned from a prior rxStartClusterJob() call Error in rxStartClusterJob(hpcServerJob, timesIsValidated = TRUE, continueOnFailure = FALSE) : Error in try({ : ODBC statement error: [Microsoft][ODBC SQL Server Driver][SQL Server]Could not find stored procedure 'master..xp_ScaleR_init_job'. Error in rxInDbJobIdParam(schedulerJobInstance, FALSE) : hpcServerJob object has an invalid id. Ensure it was returned from a prior rxStartClusterJob() call
任何帮助
将不胜感激。
为了方便访问,这里是完整的注释脚本,直到我遇到错误:
###########################################DATA SCIENCE DEEP DIVE TUTORIAL###############################################
##Create the SQL Server Data Objects##
#Provide your database connection string in an R variable.
#DDUser01 is a login created on the sql server instance for remote login.
#It has read, write and ddl access to the DeepDive database.
sqlConnString <- "Driver=SQL Server;Server=*ip address*; Database=DeepDive;Uid=DDUser01;Pwd=*******"
#Specify the name of the table you want to create, and save it in an R variable.
sqlFraudTable <- "ccFraudSmall"
#Chunking
sqlRowsPerRead = 5000
#Define a variable to store the new data source
sqlFraudDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlFraudTable, rowsPerRead = sqlRowsPerRead)
#Create a new R variable, sqlScoreTable, to store the name of the table used for scoring.
sqlScoreTable <- "ccFraudScoreSmall"
#Define a second data source object
sqlScoreDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlScoreTable, rowsPerRead = sqlRowsPerRead)
##Load Data into SQL Tables Using R##
#Create an R variable, and assign to the variable the file path for the CSV file.
ccFraudCsv <- file.path(rxGetOption("sampleDataDir"), "ccFraudSmall.csv")
#RxTextData function to specify the text data source.
inTextData <- RxTextData(file = ccFraudCsv, colClasses = c(
"custID" = "integer", "gender" = "integer", "state" = "integer",
"cardholder" = "integer", "balance" = "integer",
"numTrans" = "integer",
"numIntlTrans" = "integer", "creditLine" = "integer",
"fraudRisk" = "integer"))
#Call rxDataStep to insert the data into the SQL Server table
rxDataStep(inData = inTextData, outFile = sqlFraudDS, overwrite = TRUE)
#Variable for creating a path to the source file - score
ccScoreCsv <- file.path(rxGetOption("sampleDataDir"), "ccFraudScoreSmall.csv")
#RxTextData function to get the data and save it in the variable
inTextData <- RxTextData(file = ccScoreCsv, colClasses = c(
"custID" = "integer", "gender" = "integer", "state" = "integer",
"cardholder" = "integer", "balance" = "integer",
"numTrans" = "integer",
"numIntlTrans" = "integer", "creditLine" = "integer"))
#Call rxDataStep to overwrite the current table with the new schema and data.
rxDataStep(inData = inTextData, sqlScoreDS, overwrite = TRUE)
##Query the Data ##
#Use the function rxGetVarInfo and specify the data source you want to analyze
rxGetVarInfo(data = sqlFraudDS)
##Modify Metadata##
#Mapping of USA State abbreviations (categorical) to their integer identifiers
#Create an R variable that holds the vector of strings to add to it - different states of the USA.
stateAbb <- c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC",
"DE", "FL", "GA", "HI","IA", "ID", "IL", "IN", "KS", "KY", "LA",
"MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NB", "NC", "ND",
"NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI","SC",
"SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY")
#Create a column information object that specifies the mapping of the existing integer values to the categorical levels
#This statement also creates factor variables for gender and cardholder.
ccColInfo <- list(
gender = list(
type = "factor",
levels = c("1", "2"),
newLevels = c("Male", "Female")),
cardholder = list(type = "factor",
levels = c( "1", "2"),
newLevels = c("Principal", "Secondary")),
state = list(type = "factor", levels = as.character(1:51), newLevels = stateAbb)
)
#Update the SQL Server data source that uses the updated data
sqlFraudDS <- RxSqlServerData(connectionString = sqlConnString,
table = sqlFraudTable, colInfo = ccColInfo,
rowsPerRead = sqlRowsPerRead)
#Query new information
rxGetVarInfo(data = sqlFraudDS)
##Create and Set a Compute Context##
#Specify the connection string for the instance where computations will take place.
sqlConnString <- "Driver=SQL Server;Server=*ip address*; Database=DeepDive;Uid=DDUser01;Pwd=*******"
#Specify the location of the shared directory (temp folder for workspace objects) and save it in a variable.
sqlShareDir <- paste("c:AllShare", Sys.getenv("USERNAME"), sep="")
#Create shared directory if it does not exist
if (!file.exists(sqlShareDir)) dir.create(sqlShareDir, recursive = TRUE)
#Specify how you want the output handled.
#Here, you are indicating that the R session on the workstation should always wait for R job results,
#but not return console output from remote computations.
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
#Define the compute context object
sqlCompute <- RxInSqlServer(
connectionString = sqlConnString,
shareDir = sqlShareDir,
wait = sqlWait,
consoleOutput = sqlConsoleOutput)
#ALTERNATIVE:Enable Tracing on the Compute Context
sqlComputeTrace <- RxInSqlServer(
connectionString = sqlConnString,
shareDir = sqlShareDir,
wait = sqlWait,
consoleOutput = sqlConsoleOutput,
traceEnabled = TRUE, traceLevel = 7)
#Change Compute Context to the Server
rxSetComputeContext(sqlCompute)
##Compute Summary Statistics##
#Compute summary statistics for several of the variables
##THIS IS WHERE I FACE THE ERROR##
sumOut <- rxSummary(formula = ~gender + balance + numTrans + numIntlTrans + creditLine, data = sqlFraudDS)
假设我有 3 台机器。
我的第一台机器上装有 SQL Server 2016 和 R Services。
我的第二台机器有 SQL Server 2016,但没有启用 R 服务。
我的第三台机器上安装了独立的 Microsoft R Server。
我知道,因为我的第一台机器有带 R 服务的 SQL Server,所以我可以将 R 脚本作为存储过程存储在 SQL Server 上,稍后调用它进行分析。
我的问题在于我的独立 Microsoft R Server 如何连接到 SQL Server(启用和不启用 R 服务)。
据我了解,它使用 ODBC 连接到 SQL Server 并使用 RevoscaleR 函数查询或分析 SQL 数据 - 在这两种情况下。
如果是这样的话,
想象一下,我不愿意在我的 SQL Server 上启用 R 服务,因为它可能会占用内存并降低数据库引擎本身的性能。我想要做的就是获得一个独立的 Microsoft R Server 并将其连接到没有 R Services 的 SQL Server,并获得所有可伸缩性和性能。
这样做我错过了什么?
除了 SQL Server 2016 上的集成 R 服务,Microsoft 还提供企业级 Microsoft R Server 作为独立安装。MSDN 上的一些文档表明,独立的 MS R Server 不同于数据科学 Microsoft R Client。
这是来自 MSDN 页面的图片(左下角): https ://msdn.microsoft.com/en-us/library/mt696069.aspx
数据科学客户端链接将您带到此处:https ://msdn.microsoft.com/en-us/library/mt696067.aspx
但是设置向导评论说它们是同一回事。
如果有人对此有任何清楚的了解,请告知。
我最近才尝试使用 SQL Server 2016。因此,如果我的假设不正确,请纠正我:从对SQL Server R Services
的
一些研究中,我发现RxHDFSConnect和RxHDFSFileSystem函数有助于将数据从 Hadoop 直接加载到 SQL Server 2016 数据库中。
我一直在试验 SQL Server 2016 及其 R 服务。我的机器上还安装了独立的 Microsoft R Server。
在独立的 Microsoft R Server 上使用 SQL Server R Services 是一个很好的用例,因为两者都具有企业级 R 平台的特性。
我有一个数据库模式:
member(memb_no, name, age)
book(isbn, title, authors, publisher)
borrowed(memb_no, isbn, date)
这是问题:
对于每个出版商,打印出借过该出版商五本书以上的成员的姓名。
我该如何为此编写查询?
以下是我的尝试:
Select B.publisher,
M.memb_no, M.name
From book as B,
member as M,
borrowed as R
Where M.memb_no = R.memb_no and B.isbn = R.isbn and R.isbn in
(select B.publisher, count (R.isbn)
from borrowed as R and book as B
where B.isbn = R.isbn
group by B.publisher
having count >5);
请指出错误并解释。