Brief Context
I was working on a client project in which our team were required to write an ETL process. The ETL process was mostly written using SQL Server Integration Services (SSIS). In one part of the ETL process, we needed to do some fuzzy string similarity matching. We found that the Levenshtein string distance algorithm suited our requirements.
Aside: The Levenshtein algorithm takes two strings and computes the "edit distance" which indicates how many transformations are necessary to reach a given target string from a given source string. For example, the function Levenshtein("test", "tear") would return the value two.
In our case, we wanted to abstract the Levenshtein output into a percentage, so for example, the function GetTextSimilarity("test", "tear") would return 50% - indicating that the inputs are half similar.
Creating the Class Library
The full code for the class library is available on GitHub.
The first step is to create a .NET class library which targets a .NET framework version that the flavour of SQL Server you are working with supports. In my case, we created a C# class library targeting .NET framework 4.5 as we were going to use the class library in Microsoft SQL Server 2014.
Ensure that the method you want to call from SQL Server is marked static. You will need to decorate the method with the Microsoft.SqlServer.Server.SqlProcedure attribute. If you look at my class library code, you'll find two classes, namely, LevenshteinDistance and StringCompare. The LevenshteinDistance class contains the algorithm implementation in a static method marked with the SqlProcedure attribute. Similarly, the StringCompare method (shown below) contains one method, GetTextSimilarity, which is the method we want to call as a function from SQL server.
using System; | |
using Microsoft.SqlServer.Server; | |
namespace SqlLibrary.Text | |
{ | |
public class StringCompare | |
{ | |
[SqlProcedure] | |
public static double GetTextSimilarity( | |
string inputOne, string inputTwo | |
) | |
{ | |
if (inputOne == null) | |
throw new ArgumentNullException(nameof(inputOne)); | |
if (inputTwo == null) | |
throw new ArgumentNullException(nameof(inputTwo)); | |
if (inputOne == string.Empty && inputTwo == string.Empty) | |
return 1; | |
int distance = LevenshteinDistance | |
.GetLevenshteinDistance(inputOne, inputTwo); | |
return 1.0 - distance / | |
(double)Math.Max(inputOne.Length, inputTwo.Length); | |
} | |
} | |
} |
Registering the Class Library in SQL Server
Once you've finished writing the class library and you are ready to call your method(s) from SQL Server, you'll need to execute the following SQL script (obviously, you'll need to replace the database name, assembly file path, function name and method namespace).
USE MyDatabase | |
GO | |
CREATE ASSEMBLY SqlLibrary from 'C:\SqlLibrary.dll' WITH PERMISSION_SET = SAFE | |
GO | |
CREATE FUNCTION GetTextSimilarity(@inputOne [nvarchar](4000), @inputTwo [nvarchar](4000)) | |
RETURNS [float] WITH EXECUTE AS CALLER, RETURNS NULL ON NULL INPUT | |
AS | |
EXTERNAL NAME [SqlLibrary].[SqlLibrary.Text.StringCompare].[GetTextSimilarity] | |
GO | |
sp_configure 'clr enabled', 1 | |
GO | |
RECONFIGURE | |
GO |
Once the script has finished executing, you will find the registered assembly under the Database\Programmability\Assemblies node in SQL Server Management Studio.
If you look under Database\Programmability\Functions, you'll find the method you registered earlier (see SQL script).
You can now call your method as a standard T-SQL function call. Quite useful!