我有几个管道似乎同时执行 12:49:09,见图。是否可以在每次管道运行之间延迟管道的执行?在图中看起来好像有两个管道,但实际上只有一个管道,只是执行了两次。我想延迟每次管道运行的执行。这可能吗?
有人能告诉我在哪里可以找到并发设置吗?我似乎无法在常规管道设置中找到它
我在复制活动中的管道表达式生成器中有以下配置
@concat('SELECT * FROM ', pipeline().parameters.Domain,
'.', pipeline().parameters.TableName)
这会成功将数据复制到我们的 SQL Server 表中dbo.MyTable
。
我想在表的末尾添加后缀或一些附加字符,以便将其复制到 SQL Server 数据库dbo.MyTableV2
。
有人能告诉我如何在表中添加其他字符吗?
例如,这不起作用:
@concat('SELECT * FROM ', pipeline().parameters.Domain,
'.', pipeline().parameters.TableName, 'V2')
有什么想法吗?
我在 ADF 复制活动中有以下查询
SELECT
deltaTable.*
FROM Data.deltaTable
LEFT OUTER JOIN Data.targetTable
ON deltaTable.signature = targetTable.signature
WHERE targetTable.signature IS NULL
有人能告诉我如何参数化查询吗?当我尝试参数化查询时,我收到错误:
Parameter schema was not found under EX_SourceToRaw_Single_Table
以下代码是我的尝试:
@concat('SELECT * FROM ',pipeline().parameters.schema,'.',pipeline().parameters.DeltaTable)
LEFT OUTER JOIN pipeline().parameters.schema,'.',pipeline().parameters.TargetTable)
ON pipeline().parameters.DeltaTable).signature = pipeline().parameters.TargetTable).signature
WHERE pipeline().parameters.TargetTable).signature IS NULL
deltaTable 和 TargetTable 均如下所示:
==========================================================================================================
| CountryName | CountryISO2 | CountryISO3 | SalesRegion | signature |
==========================================================================================================
| Belgium | CHA | 10 | EMEA |800e559a27d68f0478b6|
| | | | |1c4c9f009e2418e86697|
| | | | |1b6e54b549b51b1367ab|
| | | | | 450d |
----------------------------------------------------------------------------------------------------------
| Wales | steveO | WAL | Welsh |e8c5149d54986dfe9ac9|
| | | | |5a60a76b07603fe17c28|
| | | | |2b552ec8255f123b279a|
| | | | | 533a |
----------------------------------------------------------------------------------------------------------
| Germany | DE | deletedupd | EMEA |1232b1bd91d14a87ed83|
| | | | |0f770d74cd8cabb87153|
| | | | |5c4c2b7ff5bcb873fa80|
| | | | | d851 |
----------------------------------------------------------------------------------------------------------
| Italy | IT | ITA | EMEA |584cf66de2f4af9eb4db|
| | | | |febefea808b1b4e6a357|
| | | | |87fcac1061de88cfb798|
| | | | | 56df |
----------------------------------------------------------------------------------------------------------
有人可以告诉我如何向 Azure 数据工厂中的字段添加加密哈希。
例如,我有一个现有表,我想添加名为“签名”的附加列,并且我想为“签名”列生成一个 256 加密哈希值
我知道在 ADF 的复制活动中向表中添加一列很容易,见下图,但我不知道如何向该列添加加密哈希值
我试图修改我的查询以包含将应用加密哈希的查询,但我收到语法错误:参数之间缺少逗号。
原始查询如下:
@concat('SELECT * FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
新的查询将上述查询修改如下:
@concat('SELECT *, HASHBYTES('SHA2_256', CAST(signature AS NVARCHAR(MAX))) AS Signature FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
但是,我不确定缺失的逗号应该放在哪里。
我认为我已经用以下内容修复了该查询:
@concat('SELECT *, HASHBYTES(SHA2_256, , CAST(signature AS NVARCHAR(MAX))) AS Signature FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
但是,当我执行复制活动时出现以下错误:
详细信息“源”端发生故障。'Type=Microsoft.Data.SqlClient.SqlException,Message=','附近语法不正确。,Source=Framework Microsoft SqlClient Data Provider,'
我已按如下方式修改了复制活动,但仍然收到相同的错误
我正在使用 Databricks Academy 进行学习。Databricks 附带存储在 adl 中的数据,可用于培训。
但是数据似乎无法访问。我们收到以下错误:
com.microsoft.azure.datalake.store.ADLException:获取文件 /dbacademy/people10m.parquet 的信息时出错
数据的位置是:
people10m = spark.read.parquet("adl://devszendsadlsrdpacqncd.azuredatalakestore.net/dbacademy/people10m.parquet")
有人能解释一下为什么我们无法访问数据吗
为了更清楚地解释这个问题,以下链接展示了一个关于学习聚合、JOIN 和嵌套查询的 Databricks 笔记本。为了使用笔记本进行学习,需要使用以下代码运行以下课堂设置:%run"./Includes/Classroom-Setup"
这将在名为“Classroom-Setup”的笔记本中执行以下代码
people10m = spark.read.parquet("adl://devszendsadlsrdpacqncd.azuredatalakestore.net/dbacademy/people10m.parquet")
但是,当笔记本运行代码时,我收到以下错误:
com.microsoft.azure.datalake.store.ADLException: Error getting info for file /dbacademy/people10m.parquet
因此,有人可以让我知道为什么我会收到错误,并提供解决方法
有人可以告诉我是否可以通过一次复制活动将所有表、存储过程和视图从一个 SQL DB 复制到 Azure SQLDB?
对于源数据集我有以下复制活动:
我相信上述操作将复制并创建所有表,但我不确定我是否复制并创建了存储过程、视图等...
根据@Bhavani 提供的答案,有人可以告诉我如何
添加带有两个字符串参数的源数据集和接收器数据集Schema和Table将它们定义为@dataset().Schema for schema,@dataset().Table
我已添加 Schema 和 TableName,如所述,参见图片,但出现错误“复制活动需要表”
我快到了。我修复了“复制活动需要表”错误。现在我收到错误"The expression 'length(activity('Lookup1').output.value)' cannot be evaluated because property 'value' doesn't exist, available properties are 'firstRow, effectiveIntegrationRuntime, billingReference, durationInQueue'.
以下 Python 旨在使用 Visual Studio Code 生成 ERD。
该图表是使用 matplotlib 在本地创建的。代码执行时没有任何错误,但 ERD 图显示为空白。
python代码如下:
import matplotlib.pyplot as plt
# Define the entities and their attributes for the ERD
entities = {
"Customer": ["CustomerID (PK)", "CustomerName", "ContactInfo"],
"CreditCardAccount": ["AccountID (PK)", "AccountStatus", "Balance", "CustomerID (FK)"],
"CreditCard": ["CardID (PK)", "CardNumber", "ExpiryDate", "AccountID (FK)", "BrandID (FK)"],
"CreditCardBrand": ["BrandID (PK)", "BrandName", "CardType"],
"SecondaryCardHolder": ["SecondaryHolderID (PK)", "HolderName", "RelationToPrimary", "AccountID (FK)"],
"PurchaseTransaction": ["TransactionID (PK)", "TransactionDate", "Amount", "CardID (FK)", "RetailerID (FK)"],
"Retailer": ["RetailerID (PK)", "RetailerName", "Location"],
"MonthlyStatement": ["StatementID (PK)", "StatementDate", "OutstandingBalance", "AccountID (FK)"],
"CustomerServiceInteraction": ["InteractionID (PK)", "InteractionDate", "Notes", "CustomerID (FK)"],
}
# Relationships between entities
relationships = [
("Customer", "CreditCardAccount", "1:M"),
("CreditCardAccount", "CreditCard", "1:M"),
("CreditCard", "CreditCardBrand", "M:1"),
("CreditCardAccount", "SecondaryCardHolder", "1:M"),
("CreditCard", "PurchaseTransaction", "1:M"),
("PurchaseTransaction", "Retailer", "M:1"),
("CreditCardAccount", "MonthlyStatement", "1:M"),
("Customer", "CustomerServiceInteraction", "1:M"),
]
# Plotting the ERD
fig, ax = plt.subplots(figsize=(12, 8))
# Define positions for the entities
positions = {
"Customer": (1, 5),
"CreditCardAccount": (4, 5),
"CreditCard": (7, 5),
"CreditCardBrand": (10, 5),
"SecondaryCardHolder": (4, 3),
"PurchaseTransaction": (7, 3),
"Retailer": (10, 3),
"MonthlyStatement": (4, 1),
"CustomerServiceInteraction": (1, 3),
}
# Draw entities as boxes
for entity, position in positions.items():
plt.text(position[0], position[1], f"{entity}\n" + "\n".join(entities[entity]),
ha='center', va='center', bbox=dict(facecolor='lightblue', edgecolor='black', boxstyle='round,pad=0.5'))
# Draw relationships as lines
for rel in relationships:
start_pos = positions[rel[0]]
end_pos = positions[rel[1]]
ax.annotate("",
xy=end_pos, xycoords='data',
xytext=start_pos, textcoords='data',
arrowprops=dict(arrowstyle="->", lw=1.5, color='black'),
)
# Add cardinality
midpoint = ((start_pos[0] + end_pos[0]) / 2, (start_pos[1] + end_pos[1]) / 2)
ax.text(midpoint[0], midpoint[1], rel[2], ha='center', va='center', fontsize=10)
# Hide axes
ax.set_axis_off()
# Show the ERD diagram
plt.title("Entity Relationship Diagram (ERD) for Credit Card Company", fontsize=16)
plt.show()
有人能告诉我为什么 ERD 不会出现吗?
我正在尝试创建一个用于 Databricks 的 Python wheel。我正在使用 VS Code 来生成 wheel。
我有以下 setup.py 文件:
import setuptools
with open("QuickStart.MD", "r") as fh:
long_description = fh.read()
setuptools.setup(
name="newlib",
version="0.1.8",
author="name",
author_email="[email protected]",
description="framework",
long_description=long_description,
long_description_content_type="text/markdown",
url="",
packages=setuptools.find_packages(),
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
python_requires='>=3.6',
install_requires=[
'pyodbc',
'jsonschema'
]
)
我的 README.md 文件如下所示:
# Example Package
This is a simple example package. You can use
[Github-flavored Markdown](https://guides.github.com/features/mastering-markdown/)
to write your content
当我运行python setup.py bdist_wheel时
我收到错误:
无效命令‘bdist_wheel’
有人能让我知道我错在哪里吗?
SQL 查询的结果生成下表:
|SEDOL | ISIN | Cash |
=============================
|ZZZ0072|GB00B7JYLW09| null |
-----------------------------
|CASH |GB00B7JYLW09| null |
-----------------------------
|ZZZ0072|GB00B7JYLW09| null |
-----------------------------
|ZZZ0009|GB00B7JYLW09| null |
-----------------------------
我需要一个查询帮助,该查询将用带有“IsCash”字段的表 testtable 中的值(任何值)替换 Cash 字段中的空值,并返回下表:
=============================
|SEDOL | ISIN | Cash |
=============================
|ZZZ0072|GB00B7JYLW09|150146|
-----------------------------
|CASH |GB00B7JYLW09| 182 |
-----------------------------
|ZZZ0072|GB00B7JYLW09| 1190 |
-----------------------------
|ZZZ0009|GB00B7JYLW09| 2000 |
-----------------------------
测试表是:
| IsCash |
==============
| 36 |
--------------
| 150146 |
--------------
| 182 |
--------------
| 2000 |
--------------
| 200952 |
--------------
| 200000 |
--------------
| 350000 |
--------------
| 150000 |
--------------
| 1190 |
--------------
我想我需要将该ISNULL
语句应用于“现金”字段,但我不确定用于从带有 IsCash 字段的测试表发出替换语句的子句。
结果很容易如下所示,因为我只希望 IsCash 字段中的测试表中的任何随机值出现在 Cash 字段中,如下所示
| SEDOL | ISIN | Cash |
===============================
|ZZZ0072 |GB00B7JYLW09| 200952|
-------------------------------
| CASH |GB00B7JYLW09| 200000|
-------------------------------
|ZZZ0072 |GB00B7JYLW09| 150000|
-------------------------------
|ZZZ0009 |GB00B7JYLW09| 1190 |
-------------------------------
我试图在部署管道时将 displayName 添加到我的 yaml 代码中,但是当我运行管道时,displayName 不会出现。
我期待显示名称“PreDeployment”出现:
- task: AzurePowerShell@5
inputs:
displayName: PreDeployment
azureSubscription: 'NewConnectionName'
ScriptType: 'FilePath'
ScriptPath: '$(System.DefaultWorkingDirectory)/caplogic-warehouse-dev-df/PrePostDeploymentScript.ps1'
ScriptArguments: '-armTemplate "$(System.DefaultWorkingDirectory)/caplogic-warehouse-dev-df/ARMTemplateForFactory.json" -ResourceGroupName $(ResourceGroup) -DataFactoryName $(DataFactory) -predeployment $true -deleteDeployment $false'
azurePowerShellVersion: 'LatestVersion'
我们遇到了极其缓慢的 Databricks SQL 查询。我发现一个网站提供了许多 Spark SQL 优化调优技术
https://www.linkedin.com/pulse/spark-sql-performance-tuning-configurations-vignesan-saravanan-8hamc/
链接中描述的许多建议表明这些特性/功能已默认启用。例如,默认情况下启用 Spark 基于成本的优化器。但是,它还提到,如果未启用它,您可以通过运行以下命令来启用它:
spark.conf.set("spark.sql.cbo.enabled", true)
我的问题是
我有一个包含相关列的 T-SQL 查询。当我尝试使用 Databricks SQL 执行查询时,出现错误:
Error in SQL statement: AnalysisException: Correlated column is not allowed in a non-equality predicate:
Aggregate [max(DateOfChange#20975) AS max(DateOfChange)#20988]
+- Filter (((SiiAuditTypeID#20976 = 3) AND (SplitID#20973 = outer(SplitID#20825))) AND outer(StatusID#20829) IN (1,2))
+- SubqueryAlias sia
+- SubqueryAlias spark_catalog.dbo.SiiAudit
+- Relation[SiiAuditID#20972,SplitID#20973,AccountID#20974,DateOfChange#20975,SiiAuditTypeID#20976,UserID#20977,UserName#20978,PortfolioID#20979,PortfolioSplit#20980,SiiAuditEntryKindID#20981,ModelPortfolioID#20982,primary_key_hash#20983,change_key_hash#20984,reject_reason#20985,reject_row#20986] parquet
;
Distinct
+- Project [AccountID#20826, CreatedDate#20827, StatusID#20829, scalar-subquery#20691 [SplitID#20825 && StatusID#20829] AS CancelledDate#20692]
: +- Aggregate [max(DateOfChange#20975) AS max(DateOfChange)#20988]
: +- Filter (((SiiAuditTypeID#20976 = 3) AND (SplitID#20973 = outer(SplitID#20825))) AND outer(StatusID#20829) IN (1,2))
: +- SubqueryAlias sia
: +- SubqueryAlias spark_catalog.dbo.SiiAudit
有人可以帮助重构 T-SQL,使其在没有相关/子查询的情况下执行吗?
查询如下:
SELECT DISTINCT
s.AccountID,
s.CreatedDate,
s.StatusID,
(
SELECT MAX(DateOfChange) AS CancelledDate
FROM CRM.InvestmentInstruction.SiiAudit sia
WHERE sia.SiiAuditTypeID = 3 -- cancelled
AND sia.SplitID = s.SplitID
AND s.StatusID IN ( 1, 2 ) -- cancelled, completed
) AS CancelledDate
FROM CRM.InvestmentInstruction.Split s
INNER JOIN CRM.InvestmentInstruction.SplitPortfolio sp
ON sp.SplitID = s.SplitID
INNER JOIN CRM.InvestmentInstruction.InvestmentRequest ir
ON ir.InvestmentRequestID = sp.InvestmentRequestID
INNER JOIN CRM.dbo.ModelPortfolio mp
ON mp.ModelPortfolioID = ir.ModelID
INNER JOIN
(
SELECT DISTINCT
mh.ModelPortfolioID
FROM CRM.dbo.modelHolding mh
INNER JOIN Securities.dbo.Security sec
ON sec.SecurityID = mh.LinkSecurityId
WHERE sec.IsCashSecurity = 0
) mh
ON mh.ModelPortfolioID = mp.ModelPortfolioID
WHERE s.TypeID = 0