我有一个由分号和逗号分隔的旧数据源列。第一个分号表示姓氏,第二个表示名字和中间名(或首字母),最后一个分号表示个人类型。逗号表示新名称已开始。这是此数据的示例。
+-------+---------------------------------------------------------------------------------------------------------------------+
| ID | SOURCE |
+-------+---------------------------------------------------------------------------------------------------------------------+
| 62963 | RENZ;MICHAEL;DECEASED,WANDER;MARIA;MINOR,WANDER;HENRY RUDOLPH;MINOR,WANDER;ROSA;MINOR,WANDER;PAUL EMIL;MINOR |
| 62964 | HERNDON;A C;ESTATE,BERRING;A F;DECEASED,BEIRING;A F;DECEASED,BEIRING;ANDREAS FREDERICK;DECEASED |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED |
| 62966 | KRAUS;JOSEPHINE;MINOR,KENNEDY;GEORGE;DECEASED |
| 62967 | CAREY;JAMES;ESTATE,DE LA GARZA;REFUGIO;DECEASED |
| 62968 | LEWIS;FLORENCE;ESTATE,LOCKWOOD;ALBERT A;DECEASED |
| 62969 | GLAESER;EMMA;MINOR,GLAESER;HERMAN JR;MINOR,GLAESER;HERMAN;MINOR,RODRIGUEZ;HILARIO;DECEASED,RODRIGUEZ;MARIE;DECEASED |
| 62970 | STORY;BETTIE;ESTATE,EIGENDORFF;FRANZ;DECEASED |
| 62971 | HOWELL;MAMIE;MINOR,HOWELL;ETHEL;MINOR |
+-------+---------------------------------------------------------------------------------------------------------------------+
我试图以这样的方式提取数据,以便它可以适应不同的模式:
+-----------+------------+-------------+-------------------+----------+
| ID | SEQUENCE | LAST | FIRSTMIDDLE | TYPE |
+-----------+------------+-------------+-------------------+----------+
| 62963 | 1 | RENZ | MICHAEL | DECEASED |
| 62963 | 2 | WANDER | MARIA | MINOR |
| 62963 | 3 | WANDER | HENRY RUDOLPH | MINOR |
| 62963 | 4 | WANDER | ROSA | MINOR |
| 62963 | 5 | WANDER | PAUL EMIL | MINOR |
| 62964 | 1 | HERNDON | A C | ESTATE |
| 62964 | 2 | BERRING | A F | DECEASED |
| 62964 | 3 | BEIRING | A F | DECEASED |
| 62964 | 4 | BEIRING | ANDREAS FREDERICK | DECEASED |
| 62965 | 1 | ZINCH | | ESTATE |
| 62965 | 2 | ZINTZ | | ESTATE |
| 62965 | 3 | HAYNES | HENRY | DECEASED |
| 62966 | 1 | KRAUS | JOSEPHINE | MINOR |
| 62966 | 2 | KENNEDY | GEORGE | DECEASED |
| 62967 | 1 | CAREY | JAMES | ESTATE |
| 62967 | 2 | DE LA GARZA | REFUGIO | DECEASED |
| 62968 | 1 | LEWIS | FLORENCE | ESTATE |
| 62968 | 2 | LOCKWOOD | ALBERT A | DECEASED |
| 62969 | 1 | GLAESER | EMMA | MINOR |
| 62969 | 2 | GLAESER | HERMAN JR | MINOR |
| 62969 | 3 | GLAESER | HERMAN | MINOR |
| 62969 | 4 | RODRIGUEZ | HILARIO | DECEASED |
| 62969 | 5 | RODRIGUEZ | MARIE | DECEASED |
| 62970 | 1 | STORY | BETTIE | ESTATE |
| 62970 | 2 | EIGENDORFF | FRANZ | DECEASED |
| 62971 | 1 | HOWELL | MAMIE | MINOR |
| 62971 | 2 | HOWELL | ETHEL | MINOR |
+-----------+------------+-------------+-------------------+----------+
这种类型的数据提取是我不太熟悉的。我想我需要使用 和 的复杂组合SUBSTRING
,CHARINDEX
但鉴于源列可以包含的条目数量各不相同,我不确定如何最好地解决这个问题。任何关于我应该从哪里开始的指导都会非常有帮助。
只要遗留数据如所述并且没有任何奇怪的变化,以下应该可以工作,因为它会产生所需的输出。
笔记:
WHILE
循环的拆分器)。测试设置:
主要查询:
我对这种事情使用流式表值函数:
ReadDelimited 将返回 A:CZ 列的 Excel 样式结果集。然后我们
select
只使用 A、C 和 Q 列,并使用标准 T-SQL 添加正确的类型和别名。如果您的数据包含标题,那么您可以使用Row_Number()
. 该示例还演示了如何轻松地将 STVF 与 Sql Server 的 FileTables 功能结合起来。在您的情况下,您需要设置
@delimitChars = ';'
and@newLineChars = ','
。C# CLR 代码: