Sunday 28 July 2013

Implicit/Explicit Conversion Operators

I saw some code yesterday which at first glance looked odd. A method was accepting a parameter of a custom type named Money and the calling code was passing in a double value. After a bit of confusion I recalled that the C# language natively supports conversion operators.

If you have a custom type, you have the option to define conversion operators on your type which let you convert from your custom type to a target type and from the target type back to the custom type. Conversion can be implicit or explicit. Implicit conversion means that users of your custom type can convert to the target type without having to perform a type cast operation. Explicit conversion forces a type cast to be performed.

To demonstrate this, we'll use the case I came across. Imagine we have a Money class which is composed of a currency and a value that the money represents. We may initially decide that we want the Money class to be flexible so that users of the class can treat it is a value (a double type in this case). This means that when a double is expected (e.g. as an argument to a method), the user of the class can pass the Money instance which will implicitly be converted to its value form. Moreover, we can also make the conversion from double to Money implicit, meaning that whenever a Money instance is expected, we can pass in a double. Our initial attempt at the Money class implementation would therefore look something like:
public class Money
{
    public string Currency { get; set; }
    public double Value { get; set; }

    // Implicit conversion from Money to double
    public static implicit operator double(Money money)
    {
        return money.Value;
    }

    //  Implicit conversion from double to Money
    public static implicit operator Money(double @double)
    {
        return new Money { Value = @double };
    }
}
With the above class, the following code would therefore compile:
var money = new Money() { Currency = "GBP", Value = 10.5 };
            
// Implicitly convert Money to a double
double moneyValue = money;

// Convert a double to Money (Money.Currency will be null!)
Money moneyInstance = moneyValue;
Notice that when converting a Money instance to a double, we would lose information on the Money instance (the currency in this case). Also, when converting a double to a Money, the currency is never set. It would therefore make more sense in this scenario to use explicit conversion operators so that the user is forced to be aware that they could lose information in the conversion or have a partially initialised Money instance (it isn't a straight-forward conversion). To define explicit conversion on the Money class, the only change required would be to replace the "implicit" keyword with the "explicit" keyword on the two conversion methods. The Money class would now look like:
public class Money
{
    public string Currency { get; set; }
    public double Value { get; set; }

    // Explicit conversion from Money to double
    public static explicit operator double(Money money)
    {
        return money.Value;
    }

    //  Explicit conversion from double to Money
    public static explicit operator Money(double @double)
    {
        return new Money { Value = @double };
    }
}
After making this change, the example calling code above would fail to compile with the following two compile-time error messages:

Cannot implicitly convert type 'ConversionOperators.Money' to 'double'. An explicit conversion exists (are you missing a cast?)
Cannot implicitly convert type 'double' to 'ConversionOperators.Money'. An explicit conversion exists (are you missing a cast?)

As the errors point out, we now need to explicitly cast between the types - as demonstrated in the code below.
var money = new Money() { Currency = "GBP", Value = 10.5 };
            
double moneyValue = (double) money;

Money moneyInstance = (Money) moneyValue;

You can download the example code by clicking here.

Sunday 7 July 2013

C# How To: Implement the Soundex Algorithm

I caught the end of a conversation about the Soundex algorithm at work the other day which inspired me to write an implementation of it in C#. If you are not familiar with what Soundex is then the Wikipedia article on Soundex is a good place to start. I first came across this algorithm in a Natural Language Processing module during my university education. In a nutshell, when the Soundex algorithm is applied to a word, a Soundex Code is produced as output. Words that differ in spelling but sound the same (homophones) should produce the same Soundex Codes. For instance, "to" and "two" are spelt differently, but sound the same and therefore produce the same Soundex Code of "T000".

This is a useful little algorithm and I particularly like it for its simplicity and the fact that the heuristics used in the algorithm work well in most cases (one limitation of Soundex is that it falls short of covering words that sound the same but have a different first letter, e.g. "site" and "cite" produce different Soundex codes). Soundex is useful when writing search functionality where you want to account for misspellings in the users query. It's worth pointing out that SQL Server natively supports Soundex (see the Soundex function in T-SQL, for example).

My C# implementation is below - I opted to implement it in a static class that exposes one public method "For". The example source code is available on GitHub - https://github.com/rsingh85/SoundexExample

public static class Soundex
{
    public static string For(string word)
    {
        const int MaxSoundexCodeLength = 4;

        var soundexCode = new StringBuilder();
        var previousWasHOrW = false;

        word = Regex.Replace(
            word == null ? string.Empty : word.ToUpper(),
                @"[^\w\s]",
                    string.Empty);

        if (string.IsNullOrEmpty(word))
            return string.Empty.PadRight(MaxSoundexCodeLength, '0');

        soundexCode.Append(word.First());

        for (var i = 1; i < word.Length; i++)
        {
            var numberCharForCurrentLetter =
                GetCharNumberForLetter(word[i]);

            if (i == 1 &&
                    numberCharForCurrentLetter ==
                        GetCharNumberForLetter(soundexCode[0]))
                continue;

            if (soundexCode.Length > 2 && previousWasHOrW &&
                    numberCharForCurrentLetter ==
                        soundexCode[soundexCode.Length - 2])
                continue;

            if (soundexCode.Length > 0 &&
                    numberCharForCurrentLetter ==
                        soundexCode[soundexCode.Length - 1])
                continue;

            soundexCode.Append(numberCharForCurrentLetter);

            previousWasHOrW = "HW".Contains(word[i]);
        }

        return soundexCode
                .Replace("0"string.Empty)
                    .ToString()
                        .PadRight(MaxSoundexCodeLength, '0')
                            .Substring(0, MaxSoundexCodeLength);
    }

    private static char GetCharNumberForLetter(char letter)
    {
        if ("BFPV".Contains(letter)) return '1';
        if ("CGJKQSXZ".Contains(letter)) return '2';
        if ("DT".Contains(letter)) return '3';
        if ('L' == letter) return '4';
        if ("MN".Contains(letter)) return '5';
        if ('R' == letter) return '6';

        return '0';
    }
}
Example:

// Both lines below output R100
Console.WriteLine(Soundex.For("Ravi"));
Console.WriteLine(Soundex.For("Ravee"));