要使用 Azure 数据工厂从本地 SQL 数据库提取数据,我们需要在 ADF 上配置自托管集成运行时,并按照必要的步骤连接到本地 SQL 数据库并提取数据。
有人可以告诉我如何使用 Azure Fabric Data Pipelines 从本地服务器中提取数据吗?
我正在尝试从配置了 JDBC 的 Databricks 连接到我们的 Azure SQL DB,如下所示:
DBUser = 'test2'
DBPword = 'xxxxxx'
DBServer = 'hst37dg5zxxxxxxy-exnwgcizvwnurfcoqllpyaj3q4'
DBDatabase = 'newconfigdb-xxxxxxxxxxx8-a7ea-13f21e4ab25b'
jdbcUrl = f"jdbc:sqlserver://{DBServer}.database.fabric.microsoft.com:1433;database={DBDatabase};user={DBUser};password={DBPword};encrypt=true;trustServerCertificate=false;authentication=ActiveDirectoryPassword"
df.write.mode("overwrite") \
.format("jdbc") \
.option("url", jdbcUrl) \
.option("dbtable", table)\
.save()
我收到以下错误:
com.microsoft.sqlserver.jdbc.SQLServerException:无法在 Active Directory 中对用户 test2 进行身份验证(Authentication=ActiveDirectoryPassword)。AADSTS50034:8cbfa73c-xxxxxxx8faef12fc6 目录中不存在用户帐户“EUII Hidden”。要登录此应用程序,必须将该帐户添加到该目录中。
有人能告诉我“EUII Hidden”是什么意思吗?另外,如何解决这个问题?
通过更新,当我输入 DBUser = ' [email protected] ' 时,我收到错误“无法在 Active Directory (Authentication=ActiveDirectoryPassword) 中对用户[email protected]进行身份验证。AADSTS50055:密码已过期”
但是,当我在 Azure 中创建帐户时,我没有得到用户在登录时需要提供新密码的选项,所以不确定为什么我会收到密码过期错误?
我认为我已经基本解决了这个问题。
我已经更改了密码,但是当我尝试执行代码时出现错误:
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '<'.
使用以下 PySpark 代码,我成功挂载了 Azure OneLake 存储帐户。但是,当我尝试使用以下命令读取显示路径时,display(dbutils.fs.ls('/mnt/lake'))
出现以下错误:
操作失败:“禁止”,403,GET,https ://onelake.dfs.fabric.microsoft.com/DataEngineeringWKSP ?upn=false&resource=filesystem&maxResults=5000&directory=my_lakehouse.Lakehouse&timeout=90&recursive=false,禁止,“用户无权对工作区‘xxxxxx-ad19-489b-944e-82d6fc013b87’、工件‘xxxxx-3c39-44b8-8982-ddecef9e829c’执行当前操作。”
当我尝试读取 onelake 帐户中的文件时,出现类似的错误:
操作失败:“禁止”,403,HEAD,https://onelake.dfs.fabric.microsoft.com/DataEngineeringWKSP/sqlite_lakehouse.Lakehouse/Files/expdata.csv ?upn=false&action=getStatus&timeout=90
我用来挂载onelake存储账户的代码如下:
url = "abfss://[email protected]/sqlite_lakehouse.Lakehouse"
mount_folder = "/mnt/lake"
# OAuth configuration settings for OneLake
configs = {
"fs.azure.account.auth.type.onelake.dfs.fabric.microsoft.com": "OAuth",
"fs.azure.account.oauth.provider.type.onelake.dfs.fabric.microsoft.com": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id.onelake.dfs.fabric.microsoft.com": "xxxxxx-a061-4899-994b-81253d864bc8",
"fs.azure.account.oauth2.client.secret.onelake.dfs.fabric.microsoft.com": "xxxxxx~1Q.B-Ey12zs066D_G3.E6bslnE_LqY-aFs",
"fs.azure.account.oauth2.client.endpoint.onelake.dfs.fabric.microsoft.com": "https://login.microsoftonline.com/xxxxxxxxxxxxxf12fc6/oauth2/token"
}
mounted_list = dbutils.fs.mounts()
mounted_exist = False
for item in mounted_list:
if mount_folder in item.mountPoint:
mounted_exist = True
break
if not mounted_exist:
dbutils.fs.mount(source=url, mount_point=mount_folder, extra_configs=configs)
我认为我需要在 Azure Fabric 工作区中添加权限,但我很难找到添加权限的确切位置
有人可以告诉我如何安装 Azure Fabric Onelake 吗?
当我将 Databricks 安装到 ADLS 时,我会创建以下代码:
container_name = "root"
storage_account = "xxxxxxxxx"
key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxQ=="
url = "wasbs://" + container_name + "@" + storage_account + ".blob.core.windows.net/"
config = "fs.azure.account.key." + storage_account + ".blob.core.windows.net"
mount_folder = "/mnt/path"
mounted_list = dbutils.fs.mounts()
mounted_exist = False
for item in mounted_list:
if mount_folder in item[0]:
mounted_exist = True
break
if not mounted_exist:
dbutils.fs.mount(source = url, mount_point = mount_folder, extra_configs = {config : key})
我尝试了类似的方法来安装 Azure Fabric Onelake,如下所示:
url = "abfss://[email protected]/my_lakehouse.Lakehouse"
mount_folder = "/mnt/path"
mounted_list = dbutils.fs.mounts()
mounted_exist = False
for item in mounted_list:
if mount_folder in item[0]:
mounted_exist = True
break
if not mounted_exist:
dbutils.fs.mount(source = url, mount_point = mount_folder)
但是,上述操作失败了,因为它仍在尝试挂载 ADLS Gen 2 存储,而它应该尝试挂载 onelake 存储。
有什么想法吗?
我有以下 Azure 数据工厂表达式
@{pipeline().parameters.Zone}/@{pipeline().parameters.Classification}/@{pipeline().parameters.Area}/@{pipeline().parameters.Domain}/@{pipeline().parameters.TableName}
如何将扩展名 .csv 添加到 TableName?我尝试了以下方法:
@{pipeline().parameters.Zone}/@{pipeline().parameters.Classification}
/@{pipeline().parameters.Area}/@{pipeline().parameters.Domain}
/@{pipeline().parameters.TableName,'.csv'}
但我收到了错误:
Parameter TableName,'.csv' was not found
当我尝试使用以下代码挂载 ADLS Gen Storage 帐户时,出现错误:
IllegalArgumentException:不支持 Azure 方案:abfss
container_name = "mycontainer"
storage_account = "MyStorageAccount"
key = "xxxxxxxxxx=="
url = "abfss://" + container_name + "@" + storage_account + ".dfs.core.windows.net/"
config = "fs.azure.account.key." + storage_account + ".dfs.core.windows.net"
mount_folder = "/mnt/lake"
mounted_list = dbutils.fs.mounts()
mounted_exist = False
for item in mounted_list:
if mount_folder in item[0]:
mounted_exist = True
break
if not mounted_exist:
dbutils.fs.mount(source = url, mount_point = mount_folder, extra_configs = {config : key})
我过去曾使用此方法成功从 Databricks 安装了一个 ADLS Gen 2 帐户,所以我不确定为什么会收到此错误?
我想更新这个问题,以说明我们当前的环境阻止我们创建应用程序注册,从而阻止我们创建服务主体。这就是为什么我尝试使用“帐户密钥”安装存储帐户
我在复制活动中的管道表达式生成器中有以下配置
@concat('SELECT * FROM ', pipeline().parameters.Domain,
'.', pipeline().parameters.TableName)
这会成功将数据复制到我们的 SQL Server 表中dbo.MyTable
。
我想在表的末尾添加后缀或一些附加字符,以便将其复制到 SQL Server 数据库dbo.MyTableV2
。
有人能告诉我如何在表中添加其他字符吗?
例如,这不起作用:
@concat('SELECT * FROM ', pipeline().parameters.Domain,
'.', pipeline().parameters.TableName, 'V2')
有什么想法吗?
我在 ADF 复制活动中有以下查询
SELECT
deltaTable.*
FROM Data.deltaTable
LEFT OUTER JOIN Data.targetTable
ON deltaTable.signature = targetTable.signature
WHERE targetTable.signature IS NULL
有人能告诉我如何参数化查询吗?当我尝试参数化查询时,我收到错误:
Parameter schema was not found under EX_SourceToRaw_Single_Table
以下代码是我的尝试:
@concat('SELECT * FROM ',pipeline().parameters.schema,'.',pipeline().parameters.DeltaTable)
LEFT OUTER JOIN pipeline().parameters.schema,'.',pipeline().parameters.TargetTable)
ON pipeline().parameters.DeltaTable).signature = pipeline().parameters.TargetTable).signature
WHERE pipeline().parameters.TargetTable).signature IS NULL
deltaTable 和 TargetTable 均如下所示:
==========================================================================================================
| CountryName | CountryISO2 | CountryISO3 | SalesRegion | signature |
==========================================================================================================
| Belgium | CHA | 10 | EMEA |800e559a27d68f0478b6|
| | | | |1c4c9f009e2418e86697|
| | | | |1b6e54b549b51b1367ab|
| | | | | 450d |
----------------------------------------------------------------------------------------------------------
| Wales | steveO | WAL | Welsh |e8c5149d54986dfe9ac9|
| | | | |5a60a76b07603fe17c28|
| | | | |2b552ec8255f123b279a|
| | | | | 533a |
----------------------------------------------------------------------------------------------------------
| Germany | DE | deletedupd | EMEA |1232b1bd91d14a87ed83|
| | | | |0f770d74cd8cabb87153|
| | | | |5c4c2b7ff5bcb873fa80|
| | | | | d851 |
----------------------------------------------------------------------------------------------------------
| Italy | IT | ITA | EMEA |584cf66de2f4af9eb4db|
| | | | |febefea808b1b4e6a357|
| | | | |87fcac1061de88cfb798|
| | | | | 56df |
----------------------------------------------------------------------------------------------------------
有人可以告诉我如何向 Azure 数据工厂中的字段添加加密哈希。
例如,我有一个现有表,我想添加名为“签名”的附加列,并且我想为“签名”列生成一个 256 加密哈希值
我知道在 ADF 的复制活动中向表中添加一列很容易,见下图,但我不知道如何向该列添加加密哈希值
我试图修改我的查询以包含将应用加密哈希的查询,但我收到语法错误:参数之间缺少逗号。
原始查询如下:
@concat('SELECT * FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
新的查询将上述查询修改如下:
@concat('SELECT *, HASHBYTES('SHA2_256', CAST(signature AS NVARCHAR(MAX))) AS Signature FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
但是,我不确定缺失的逗号应该放在哪里。
我认为我已经用以下内容修复了该查询:
@concat('SELECT *, HASHBYTES(SHA2_256, , CAST(signature AS NVARCHAR(MAX))) AS Signature FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
但是,当我执行复制活动时出现以下错误:
详细信息“源”端发生故障。'Type=Microsoft.Data.SqlClient.SqlException,Message=','附近语法不正确。,Source=Framework Microsoft SqlClient Data Provider,'
我已按如下方式修改了复制活动,但仍然收到相同的错误
我正在使用 Databricks Academy 进行学习。Databricks 附带存储在 adl 中的数据,可用于培训。
但是数据似乎无法访问。我们收到以下错误:
com.microsoft.azure.datalake.store.ADLException:获取文件 /dbacademy/people10m.parquet 的信息时出错
数据的位置是:
people10m = spark.read.parquet("adl://devszendsadlsrdpacqncd.azuredatalakestore.net/dbacademy/people10m.parquet")
有人能解释一下为什么我们无法访问数据吗
为了更清楚地解释这个问题,以下链接展示了一个关于学习聚合、JOIN 和嵌套查询的 Databricks 笔记本。为了使用笔记本进行学习,需要使用以下代码运行以下课堂设置:%run"./Includes/Classroom-Setup"
这将在名为“Classroom-Setup”的笔记本中执行以下代码
people10m = spark.read.parquet("adl://devszendsadlsrdpacqncd.azuredatalakestore.net/dbacademy/people10m.parquet")
但是,当笔记本运行代码时,我收到以下错误:
com.microsoft.azure.datalake.store.ADLException: Error getting info for file /dbacademy/people10m.parquet
因此,有人可以让我知道为什么我会收到错误,并提供解决方法
有人可以告诉我是否可以通过一次复制活动将所有表、存储过程和视图从一个 SQL DB 复制到 Azure SQLDB?
对于源数据集我有以下复制活动:
我相信上述操作将复制并创建所有表,但我不确定我是否复制并创建了存储过程、视图等...
根据@Bhavani 提供的答案,有人可以告诉我如何
添加带有两个字符串参数的源数据集和接收器数据集Schema和Table将它们定义为@dataset().Schema for schema,@dataset().Table
我已添加 Schema 和 TableName,如所述,参见图片,但出现错误“复制活动需要表”
我快到了。我修复了“复制活动需要表”错误。现在我收到错误"The expression 'length(activity('Lookup1').output.value)' cannot be evaluated because property 'value' doesn't exist, available properties are 'firstRow, effectiveIntegrationRuntime, billingReference, durationInQueue'.
以下 Python 旨在使用 Visual Studio Code 生成 ERD。
该图表是使用 matplotlib 在本地创建的。代码执行时没有任何错误,但 ERD 图显示为空白。
python代码如下:
import matplotlib.pyplot as plt
# Define the entities and their attributes for the ERD
entities = {
"Customer": ["CustomerID (PK)", "CustomerName", "ContactInfo"],
"CreditCardAccount": ["AccountID (PK)", "AccountStatus", "Balance", "CustomerID (FK)"],
"CreditCard": ["CardID (PK)", "CardNumber", "ExpiryDate", "AccountID (FK)", "BrandID (FK)"],
"CreditCardBrand": ["BrandID (PK)", "BrandName", "CardType"],
"SecondaryCardHolder": ["SecondaryHolderID (PK)", "HolderName", "RelationToPrimary", "AccountID (FK)"],
"PurchaseTransaction": ["TransactionID (PK)", "TransactionDate", "Amount", "CardID (FK)", "RetailerID (FK)"],
"Retailer": ["RetailerID (PK)", "RetailerName", "Location"],
"MonthlyStatement": ["StatementID (PK)", "StatementDate", "OutstandingBalance", "AccountID (FK)"],
"CustomerServiceInteraction": ["InteractionID (PK)", "InteractionDate", "Notes", "CustomerID (FK)"],
}
# Relationships between entities
relationships = [
("Customer", "CreditCardAccount", "1:M"),
("CreditCardAccount", "CreditCard", "1:M"),
("CreditCard", "CreditCardBrand", "M:1"),
("CreditCardAccount", "SecondaryCardHolder", "1:M"),
("CreditCard", "PurchaseTransaction", "1:M"),
("PurchaseTransaction", "Retailer", "M:1"),
("CreditCardAccount", "MonthlyStatement", "1:M"),
("Customer", "CustomerServiceInteraction", "1:M"),
]
# Plotting the ERD
fig, ax = plt.subplots(figsize=(12, 8))
# Define positions for the entities
positions = {
"Customer": (1, 5),
"CreditCardAccount": (4, 5),
"CreditCard": (7, 5),
"CreditCardBrand": (10, 5),
"SecondaryCardHolder": (4, 3),
"PurchaseTransaction": (7, 3),
"Retailer": (10, 3),
"MonthlyStatement": (4, 1),
"CustomerServiceInteraction": (1, 3),
}
# Draw entities as boxes
for entity, position in positions.items():
plt.text(position[0], position[1], f"{entity}\n" + "\n".join(entities[entity]),
ha='center', va='center', bbox=dict(facecolor='lightblue', edgecolor='black', boxstyle='round,pad=0.5'))
# Draw relationships as lines
for rel in relationships:
start_pos = positions[rel[0]]
end_pos = positions[rel[1]]
ax.annotate("",
xy=end_pos, xycoords='data',
xytext=start_pos, textcoords='data',
arrowprops=dict(arrowstyle="->", lw=1.5, color='black'),
)
# Add cardinality
midpoint = ((start_pos[0] + end_pos[0]) / 2, (start_pos[1] + end_pos[1]) / 2)
ax.text(midpoint[0], midpoint[1], rel[2], ha='center', va='center', fontsize=10)
# Hide axes
ax.set_axis_off()
# Show the ERD diagram
plt.title("Entity Relationship Diagram (ERD) for Credit Card Company", fontsize=16)
plt.show()
有人能告诉我为什么 ERD 不会出现吗?
我正在尝试创建一个用于 Databricks 的 Python wheel。我正在使用 VS Code 来生成 wheel。
我有以下 setup.py 文件:
import setuptools
with open("QuickStart.MD", "r") as fh:
long_description = fh.read()
setuptools.setup(
name="newlib",
version="0.1.8",
author="name",
author_email="[email protected]",
description="framework",
long_description=long_description,
long_description_content_type="text/markdown",
url="",
packages=setuptools.find_packages(),
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
python_requires='>=3.6',
install_requires=[
'pyodbc',
'jsonschema'
]
)
我的 README.md 文件如下所示:
# Example Package
This is a simple example package. You can use
[Github-flavored Markdown](https://guides.github.com/features/mastering-markdown/)
to write your content
当我运行python setup.py bdist_wheel时
我收到错误:
无效命令‘bdist_wheel’
有人能让我知道我错在哪里吗?