Inner Working Mechanism of BBCode

Lets start off from here:

string Fields = "{url};{displaytext}";
string InputSyntax = "[url={url}]{displaytext}[/url]";
string HtmlSyntax = "<a href=\"{url}\">{displaytext}</a>"; 
string input = // some input text
string output = BBCode.ConvertToHtml(input, InputSyntax, HtmlSyntax, Fields);

After BBCode.Net receive the InputSyntax(The BBCode), it will start to study the syntax structure.

Loop through all the provided fields and replace it with a temporary string.

From this,

 [url={url}]{displaytext}[/url] 

we got:

[url=^`````````````^]^`````````````^[/url] 

replace this

 ^`````````````^ 

with a Regex syntax and to construct and Regex Search Pattern to match this search condition:

Look for any block of text that match this pattern:

[url= {any characters} ] {any characters} [/url]

Start Regex Syntax Replacement Process....

string tempInputSyntax = oriInputSyntax.Replace("\\", "\\\\")
                                       .Replace(".", "\\.")
                                       .Replace("{", "\\{")
                                       .Replace("}", "\\}")
                                       .Replace("[", "\\[")
                                       .Replace("]", "\\]")
                                       .Replace("+", "\\+")
                                       .Replace("$", "\\$")
                                       .Replace(" ", "\\s")
                                       .Replace("#", "[0-9]")
                                       .Replace("?", ".")
                                       .Replace("*", "\\w*")
                                       .Replace("%", ".*"); 
string _regexValue = ".+?";  
string RegexPattern = tempInputSyntax.Replace(_tempValueStr, _regexValue); 
return RegexPattern;

This is the Regex Search Pattern for this BBCode :

"\\[url=.+?\\].+?\\[/url\\]"  

The Regex symbol of "." (dot) means any character (dynamic pattern). Regex symbol of plus "+?" means repeats the previous item once or more until next fixed pattern. Brackets "[" and "]" means within a character range. But, in this case, we are not using it as function, it appears there as one of the fixed character of non-field block, therefore we escape it with "\" and becomes this "\\[" and this "\\]". Why double slash "\\" ? It is escape sequence for C#. C# will tell you that single slash like this "\[" is not a recognized escape sequence.

However, if you are entering the Regex formula at UI, not code behind, double slash is not needed.

\[url=.+?\].+?\[/url\] 

Read more: Regular Expression Basic Syntax

You can notice that, if the value in 1st field contains the closing bracket symbol "]", then it will break the syntax (syntax error). There are 2 result of this:

  1. Regex unable to locate the InputSyntax and just ignore it. The InputSyntax will not be processed.
  2. Invalid values are somehow able to be passed into HtmlSyntax, the html code block is generated, but with unrecognized value within. Example:
    <a href="http://www[codeproject]com">The [odeProje[t</a>

Next, perform the pattern search within the Input Text

Example of text:

Lost of programming tips can be obtained in search engines, example of search engines: [url=http://www.google.com]Google[/url], [url=http://www.yahoo.com]Yahoo[/url], [url=http://www.bing.com]Bing[/url], etc... Ebooks are available too.  

Searching the text for InputSyntax with Regex:

using System.Text.RegularExpressions;  
MatchCollection mc = Regex.Matches(text, RegexPattern)
Respond.Write(mc.Count.ToString()); // Result: 3    
foreach (Match m in mc)
{
    string customInsertPart = m.Value;
    Respond.Write(customInsertPart); 
} 

3 blocks are identified and extracted

[url=http://www.google.com]Google[/url]
[url=http://www.yahoo.com]Yahoo[/url]
[url=http://www.bing.com]Bing[/url] 

Extract the Values

Retrieved the following values from previous processes.

string customInsertPart = [url=http://www.google.com]Google[/url]
string oriInputSyntax = [url=^`````````````^]^`````````````^[/url]
string _tempValueStr = ^`````````````^

string[] nonFieldArray = oriInputSyntax.Split(new string[] { _tempValueStr }, StringSplitOptions.RemoveEmptyEntries); 
return nonFieldArray;  

the structure of non-field blocks is identified in nonFieldArray:

Blocks    Text       Length
------    ------     ------ 
0         [url=        5
1         ]            1
2         [/url]       6 

Get Field Index:

var _idxFields = new Dictionary<int, string>(); 
for (int i = 0; i < nonFieldArray.Length; i++)
{
    // Remove non Field Block
    inputSyntax = inputSyntax.Substring(nonFieldArray[i].Length, inputSyntax.Length - nonFieldArray[i].Length);
    // Get Field index
    foreach (string s in _fields)
    {
        if (inputSyntax.Length < s.Length)
            break;
        // Calculate the Field's Length
        string b = inputSyntax.Substring(0, s.Length);
                
        // Check, if the current field's name
        // If match
        if (b == s)
        {
            // Add the field and index into dictionary
            _idxFields[i] = b;
            // Remove field from inputSyntax
            inputSyntax = inputSyntax.Substring(s.Length, inputSyntax.Length - s.Length);
            break;
        }
    }
} 
return _idxFields;  

The structure of the InputSyntax is studied.

Result: 

_idxFields
Count = 2
    [0]: {[0, {url}]}
    [1]: {[1, {displaytext}]} 

Get values:

var _idxValues = new Dictionary<int, string>(); 
for (int i = 0; i < nonFieldArray.Length; i++)
{
    // Remove non field block
    customPart = customPart.Substring(nonFieldArray[i].Length, customPart.Length - nonFieldArray[i].Length);
 
    // Current non-field block is the last block
    // Terminate the loop.
    // No more value block should exist after last block
    if (i + 1 >= nonFieldArray.Length)
        break;
 
    // Detect next non-field block and calculate value length
    int v =  customPart.IndexOf(nonFieldArray[i+1]);
 
    // Get the index and value into dictionary
    _idxValues[i] = customPart.Substring(0, v);
 
    // Remove the added value from input text
    customPart = customPart.Substring(v, customPart.Length - v);
}

Values obtained. Stored inside _idxValues.

_idxValues
Count = 2
    [0]: {[0, http://www.google.com]}
    [1]: {[1, Google]}

Html & Script Injection Prevention

// Loop through all values
foreach (KeyValuePair<int, string> kv in _idxValues)
{
    bool portentialScriptExists = false;
    // Find out whether the value contains "<"
    if (kv.Value.Contains("<") || kv.Value.Contains("&lt;"))
    {
        _idxValues[kv.Key] = "";
        portentialScriptExists = true; ;
    }
    if (portentialScriptExists)
    {
        StringBuilder sb = new StringBuilder();
        // Recombine the non-Fields with original values
        for (int n = 0; n < nonFieldArray.Length; n++)
        {
            sb.Append(nonFieldArray[n]);
            if (_idxFields.ContainsKey(n))
                sb.Append(_idxValues[n]);
        }
        // Return the filtered value, the Html Conversion is skipped.
        return sb.ToString();
    }
} 

Fill in All Extracted Values into HtmlSyntax's Field 

foreach (KeyValuePair<int, string> kv in _idxFields)
{
    HtmlSyntax = HtmlSyntax.Replace(kv.Value, _idxValues[kv.Key].Replace("<", "&lt;"));
} 

Final step, replace InputSyntax in Text with HtmlSyntax

text = text.Replace(InputSyntax, HtmlSyntax); 

Last edited Jan 20, 2013 at 5:00 PM by adriancs, version 6

Comments

No comments yet.