Brief Context
I was working on a client project in which our team were required to write an ETL process. The ETL process was mostly written using SQL Server Integration Services (SSIS). In one part of the ETL process, we needed to do some fuzzy string similarity matching. We found that the Levenshtein string distance algorithm suited our requirements.
Aside: The Levenshtein algorithm takes two strings and computes the "edit distance" which indicates how many transformations are necessary to reach a given target string from a given source string. For example, the function Levenshtein("test", "tear") would return the value two.
In our case, we wanted to abstract the Levenshtein output into a percentage, so for example, the function GetTextSimilarity("test", "tear") would return 50% - indicating that the inputs are half similar.
Creating the Class Library
The full code for the class library is available on GitHub.
The first step is to create a .NET class library which targets a .NET framework version that the flavour of SQL Server you are working with supports. In my case, we created a C# class library targeting .NET framework 4.5 as we were going to use the class library in Microsoft SQL Server 2014.
Ensure that the method you want to call from SQL Server is marked static. You will need to decorate the method with the Microsoft.SqlServer.Server.SqlProcedure attribute. If you look at my class library code, you'll find two classes, namely, LevenshteinDistance and StringCompare. The LevenshteinDistance class contains the algorithm implementation in a static method marked with the SqlProcedure attribute. Similarly, the StringCompare method (shown below) contains one method, GetTextSimilarity, which is the method we want to call as a function from SQL server.
The GetTextSimilarity method calls the LevenshteinDistance.GetLevenshteinDistance method and returns the percentage similarity value that we required. Note that only the method you want to call as a function requires the SqlProcedure attribute. In this case, we could have only marked the GetTextSimilarity method with this attribute as this is the only method we want to call from SQL Server.
Registering the Class Library in SQL Server
Once you've finished writing the class library and you are ready to call your method(s) from SQL Server, you'll need to execute the following SQL script (obviously, you'll need to replace the database name, assembly file path, function name and method namespace).
Once the script has finished executing, you will find the registered assembly under the Database\Programmability\Assemblies node in SQL Server Management Studio.
If you look under Database\Programmability\Functions, you'll find the method you registered earlier (see SQL script).
You can now call your method as a standard T-SQL function call. Quite useful!