Estamos migrando do SQL 2005 [instância e banco de dados com agrupamento de SQL_Latin1_General_CP1_CI_AS
] para SQL 2008 [cujo padrão é Latin1_General_CI_AS
].
Concluí uma instalação do SQL 2008 R2 e usei o Latin1_General_CI_AS
agrupamento padrão, com a restauração do banco de dados ainda em SQL_Latin1_General_CP1_CI_AS
. Ocorreram os problemas de exceção - as tabelas #temp foram inseridas Latin1_General_CI_AS
enquanto o banco de dados estava SQL_Latin1_General_CP1_CI_AS
e é aqui que estou agora - preciso de conselhos sobre as armadilhas agora, por favor.
Na instalação do SQL 2008 R2, tenho a opção de instalação 'SQL Collation, used for backwards compatibility'
onde tenho a opção de selecionar o mesmo agrupamento do banco de dados 2005: SQL_Latin1_General_CP1_CI_AS
.
Isso me permitirá não ter problemas com tabelas #temp, mas existem armadilhas?
Eu perderia alguma funcionalidade ou recurso de qualquer tipo por não usar um agrupamento "atual" do SQL 2008?
- E quando passarmos (por exemplo, em 2 anos) de 2008 para o SQL 2012? Terei problemas então?
Em algum momento eu seria forçado a ir
Latin1_General_CI_AS
?Eu li que alguns scripts de DBA completam as linhas de bancos de dados completos e, em seguida, executam o script de inserção no banco de dados com o novo agrupamento - estou com muito medo e cauteloso com isso - você recomendaria fazer isso?
Em primeiro lugar, peço desculpas por uma resposta tão longa, pois sinto que ainda há muita confusão quando as pessoas falam sobre termos como agrupamento, ordem de classificação, página de código etc.
De BOL :
Isso significa que Collation é muito importante, pois especifica regras sobre como as cadeias de caracteres dos dados são classificadas e comparadas.
Nota: Mais informações sobre COLLATIONPROPERTY
Agora vamos primeiro entender as diferenças ......
Executando abaixo do T-SQL:
Os resultados seriam:
Olhando para os resultados acima, a única diferença é a ordem de classificação entre os 2 agrupamentos. Mas isso não é verdade, o que você pode ver abaixo:
Teste 1:
Resultados do Teste 1:
Nos resultados acima, podemos ver que não podemos comparar diretamente os valores em colunas com agrupamentos diferentes, você deve usar
COLLATE
para comparar os valores da coluna.TESTE 2:
A principal diferença é o desempenho, como aponta Erland Sommarskog nesta discussão no msdn .
--- Criar índices em ambas as tabelas
--- Execute as consultas
--- Isto terá Conversão IMPLÍCITA
--- Execute as consultas
--- Isso NÃO terá Conversão IMPLÍCITA
O motivo da conversão implícita é porque eu tenho meu agrupamento de banco de dados e servidor como
SQL_Latin1_General_CP1_CI_AS
e a tabela Table_Latin1_General_CI_AS tem comentários de coluna definidos comoVARCHAR(50)
COLLATE Latin1_General_CI_AS , portanto, durante a pesquisa, o SQL Server precisa fazer uma conversão IMPLICIT.Teste 3:
Com a mesma configuração, agora vamos comparar as colunas varchar com os valores nvarchar para ver as mudanças nos planos de execução.
-- executar a consulta
-- executar a consulta
Observe que a primeira consulta é capaz de fazer busca de índice, mas tem que fazer conversão implícita, enquanto a segunda faz uma varredura de índice que se mostra ineficiente em termos de desempenho quando varre tabelas grandes.
Conclusão:
SQL_Latin1_General_CP1_CI_AS
is a SQL collation with the rules that allow you to sort data for unicode and non-unicode are different.Latin1_General_CI_AS
is a Windows collation with the rules that allow you to sort data for unicode and non-unicode are same.See my answer above.
It all depends on what functionality/features you are referring to. Collation is storing and sorting of data.
Cant vouch ! As things might change in and its always good to be inline with Microsoft's suggestion + you need to understand your data and the pitfalls that I mentioned above. Also refer to this and this connect items.
When you want to change collation, then such scripts are useful. I have found myself changing collation of databases to match server collation many times and I have some scripts that does it pretty neat. Let me know if you need it.
References :
In addition to what @Kin detailed in his answer, there are a few more things to be aware of when switching the server's (i.e. instance's) default collation (items above the horizontal line are directly relevant to the two collations mentioned in the Question; items below the horizontal line are relevant to the general):
IF YOUR DATABASE'S DEFAULT COLLATION IS NOT CHANGING, then the "implicit conversion" performance issue described in @Kin's answer should not be a problem since string literals and local variables use the Database's default Collation, not the server's. The only impacts for the scenario in which the instance level Collation is changed but not the database level Collation are (both described in detail below):
One difference between these two Collations is in how they sort certain characters for
VARCHAR
data (this does not affectNVARCHAR
data). The non-EBCDICSQL_
Collations use what is called "String Sort" forVARCHAR
data, while all other Collations, and evenNVARCHAR
data for the non-EBCDICSQL_
Collations, use what is called "Word Sort". The difference is that in "Word Sort", the dash-
and apostrophe'
(and maybe a few other characters?) are given a very low weight and are essentially ignored unless there are no other differences in the strings. To see this behavior in action, run the following:Returns:
and:
While you will "lose" the "String Sort" behavior, I'm not sure that I would call that a "feature". It is a behavior that has been deemed undesirable (as evidenced by the fact that it wasn't brought forward into any of the Windows collations). However, it is a definite difference of behavior between the two collations (again, just for non-EBCDIC
VARCHAR
data), and you might have code and/or customer expectations based upon the "String Sort" behavior. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.Another difference between
SQL_Latin1_General_CP1_CI_AS
andLatin1_General_100_CI_AS
is the ability to do Expansions onVARCHAR
data (NVARCHAR
data can already do these for mostSQL_
Collations), such as handlingæ
as if it wereae
:Returns:
The only thing you are "losing" here is not being able to do these expansions. Generally speaking, this is another benefit of moving to a Windows Collation. However, just like with the "String Sort" to "Word Sort" move, the same caution applies: it is a definite difference of behavior between the two collations (again, just for
VARCHAR
data), and you might have code and/or customer expectations based upon not having these mappings. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.(first noted in this S.O. answer by @Zarepheth: Can SQL Server SQL_Latin1_General_CP1_CI_AS be safely converted to Latin1_General_CI_AS?)
The server-level collation is used to set the collation of the system databases, which includes
[model]
. The[model]
database is used as a template to create new databases, which includes[tempdb]
upon each server startup. But, even with a change of server-level collation changing the collation of[tempdb]
, there is a somewhat easy way to correct for collation differences between the database that is "current" whenCREATE #TempTable
is executed and[tempdb]
. When creating temporary tables, declare a collation using theCOLLATE
clause and specify a collation ofDATABASE_DEFAULT
:It is best to use the most recent version of the desired collation, if multiple versions are available. Starting in SQL Server 2005, a "90" series of collations was introduced, and SQL Server 2008 introduced a "100" series of collations. You can find these collations by using the following queries:
Since you are on SQL Server 2008 R2, you should use
Latin1_General_100_CI_AS
instead ofLatin1_General_CI_AS
.A difference between the case-sensitive versions of these particular collations (i.e.
SQL_Latin1_General_CP1_CS_AS
andLatin1_General_100_CS_AS
) is in the order of upper-case and lower-case letters when doing case-sensitive sorting. This also affects single-character class ranges (i.e.[start-end]
) that can be used with theLIKE
operator and thePATINDEX
function. The following three queries show this effect for both sorting and the character range.:The only way to get upper-case to sort before lower-case (for the same letter) is to use one of the 31 Collations that supports that behavior, which is the
Hungarian_Technical_*
Collations and a handful ofSQL_
Collations (which only support this behavior forVARCHAR
data).Less important for this particular change, but still good to know about since it would be impacting if changing the server to a binary or case-sensitive collation, is that the server level collation also affects:
sysname
datatypeMeaning, if you or "the programmer who left recently" who is apparently responsible for all bad code ;-) were not careful about casing and declared a variable as
@SomethingID
but then referred to it as@somethingId
later on, that would break if moving to a case-sensitive or binary collation. Similarly, code that uses thesysname
datatype but refers to it asSYSNAME
,SysName
, or something other than all lower-case will also break if moved to an instance using a case-sensitive or binary collation.