The Highlighter class

Open up your text editor project from the last chapter and create a new file in its root folder called highlighter.py. We can now write our Highlighter class in here:

import tkinter as tk


class
Highlighter:
def __init__(self, text_widget):
self.text_widget = text_widget
self.numbers_color = "blue"
self.keywords_color = "orange"
self.keywords = ["True", "False", "def", "for", "while", "import",
"if", "elif", "else"]
self.disallowed_previous_chars = ["_", "-", "."]

self.tag_configure("keyword", foreground=self.keywords_color)
self.tag_configure("number", foreground=self.numbers_color)


self.text_widget.bind('<KeyRelease>', self.on_key_release)

Our Highlighter class will need to keep a reference to the Text widget which it is highlighting, so we require this as a parameter within our __init__ method.

To start with, we will use a couple of attributes to hold the color of our numbers and keywords (we will make a more elegant solution shortly). Numbers will be blue and keywords will be orange. We also need a way of determining what a keyword is, so we hold a list of them too.

Both of the tags we will be using are configured to change the text color of any ranges which have the tags applied, and we bind a method to the <KeyRelease> event:

def on_key_release(self, event=None):
self.highlight()

The on_key_release method will just call a highlight method, which is responsible for changing the color of our numbers and keywords:

def highlight(self, event=None):
length = tk.IntVar()
for keyword in self.keywords:
start = 1.0
keyword = keyword + "[^A-Za-z_-]"
idx = self.text_widget.search(keyword, start, stopindex=tk.END,
count=length, regexp=1)

We begin the highlight method with an IntVar which will hold the length of any match. We now need to iterate through all of our keywords and search for them inside the Text widget.

Since we want to start at the beginning of the text, we set the start variable to 1.0.

As we want only whole-word matches, we are going to need to use the power of regex. The string [^A-Za-z_-] is added onto the end of each keyword. This small piece of regex is saying that we only want to match the word if it does not have any alphabet characters, an underscore, or a dash after it. For example, if we are searching to highlight the word for, we do not want to highlight the first three letters of fortress. This regex will prevent a match on the word fortress as it has an alphabet character (a "t") after the word for.

Our syntax highlighting will rely on regular expressions a lot. All programming languages have slight differences in capabilities when it comes to interpreting them. Tcl, the language behind Tkinter, differs from Python itself. To learn about Tcl's implementation of regular expressions, visit the Tcl documentation at http://wiki.tcl.tk.

We now search for the modified keyword with the Text widget's search method, passing in the argument regexp=1 to tell it that our pattern text should be interpreted as a regular expression:

while idx:
char_match_found = int(str(idx).split('.')[1])
line_match_found = int(str(idx).split('.')[0])
if char_match_found > 0:
previous_char_index = str(line_match_found) + '.' +
str(char_match_found - 1)
previous_char = self.text_widget.get(previous_char_index,
previous_char_index + "+1c")

As we have seen before, a while loop is utilized to ensure we will find every match within the Text widget.

To continue safeguarding against non-whole-word matches, we will need to check the character before the beginning of our match index. We cannot use the same regex expression as before since this would prevent matches which occur at the very beginning of a line (as is common with import statements).

In order to do this, we first split the index of the match on the full stop, separating the line number from the character number. We then check whether the character number is greater than 0, since this implies that the match was not the first word on this line. If it is indeed greater, then we will need to check if the preceding character is one of our disallowed ones:

if previous_char.isalnum() or previous_char in self.disallowed_previous_chars:
end = f"{idx}+{length.get() - 1}c"
start = end
idx = self.text_widget.search(keyword, start, stopindex=tk.END,
regexp=1)

If the character before our match is a letter, number, underscore, hyphen, or full stop, then we do not want to highlight the match. We do, however, need to continue the search from the end of our detected word.

To find the ending index of our match, we will use the +nc string. We add on the length of the match, which is stored in our length variable, but take one away to account for the character which will have been captured by our [^A-Za-z_-] regex.

We then set the start of our search to this ending index and fire off the search method once again to continue the loop:

else:
end = f"{idx}+{length.get() - 1}c"
self.text_widget.tag_add(category, idx, end)

start = end
idx = self.text_widget.search(keyword, start, stopindex=tk.END,
regexp=1)

If the previous character was not a disallowed one, then we can add the tag. We get the ending index in the same way as in the preceding code and use the tag_add method to add the necessary tag to the discovered range.

We then set the start to the calculated end and resume the search:

else:
end = f"{idx}+{length.get() - 1}c"
self.text_widget.tag_add(category, idx, end)

start = end
idx = self.text_widget.search(keyword, start, stopindex=tk.END,
regexp=1)

If the match was found at character 0, then we do not need to check the previous character since there isn't one. We can go ahead and add the tag and continue the search in the same way.

Once all of our keywords are tagged, we need to do the same with numbers. As we cannot store a list of all numbers we will need to use another regular expression. We can then run this through much the same loop as before:

start = 1.0
idx = self.text_widget.search(r"(d)+[.]?(d)*", start, stopindex=tk.END, regexp=1, count=length)
while idx:
end = f"{idx}+{length.get()}c"
self.text_widget.tag_add("number", idx, end)

start = end
idx = self.text_widget.search(r"(d)+[.]?(d)*", start,
stopindex=tk.END, regexp=1, count=length)

The regular expression used here can be broken down as follows:

  • (d)+: Match one or more numbers
  • [.]?: Match zero or one decimal point
  • (d)*: Match zero or more numbers following the decimal point

This regex will allow us to tag and color both integers and floating point numbers.

We have now tagged all of the keywords in our keywords attribute, as well as all numbers. Another thing which most editors will highlight is strings. These can be detected by searching for characters between either two speech mark characters (") or two apostrophe characters ('). Again, we would need to use a regular expression to match these.

To avoid repeatedly copying the number-highlighting code and making our highlight method huge, let's split it off into a new method. We can call this method highlight_regex and write it to perform our usual tagging loop on any regular expression.

Cut the last block of code from the highlight function and replace it with this:

self.highlight_regex(r"(d)+[.]?(d)*", "number")

Now, let's create a function called highlight_regex and paste the number-highlighting code inside it. Replace the number-detecting regex with an argument called regex, and the number tag with an argument called tag, so that it looks like this:

def highlight_regex(self, regex, tag):
length = tk.IntVar()
start = 1.0
idx = self.text_widget.search(regex, start, stopindex=tk.END, regexp=1,
count=length)
while idx:
end = f"{idx}+{length.get()}c"
self.text_widget.tag_add(tag, idx, end)

start = end
idx = self.text_widget.search(regex, start, stopindex=tk.END,
regexp=1, count=length)

Now that we have this function, we can add two more lines to the end of our highlight method:

self.highlight_regex(r"['][^']*[']", "string")
self.highlight_regex(r"["][^']*["]", "string")

These regexes can be broken down as follows:

  • ["]: Match the string opening character (")
  • [^"]*: Match any number of characters which are not the string-closing character
  • ["]: Match the string-closing character

This will now add the string tag to any matches found in the Text widget.

Since we do not have a configured string tag, this will currently not do anything.

Hard-coding colors and keywords can get very tedious and clog up the code dramatically. As well as this, all of our keywords will be the same color, which is less than ideal. Instead of continuing our keyword configuring in this way, let's pass the configuration on to something which is better suited for it and utilize that.

The particular technology I have decided to use for this project is YAML. YAML is a configuration file syntax which has a Python library available to parse it.

Create a folder inside your root named languages and place a file called python.yaml inside. The following will go into that file:

categories:
keywords:
color: orange
matches: [for, def, while, from, import, as, with, self]

variables:
color: red4
matches: ['True', 'False', None]

conditionals:
color: green
matches: [try, except, if, else, elif]

functions:
color: blue4
matches: [int, str, dict, list, set, float]

numbers:
color: purple

strings:
color: '#e1218b'

The syntax of YAML should be straightforward. Keys are marked by an ending colon, subkeys are indicated by indentation, and values follow keys on the same line.

A comment in YAML is indicated by a hash (#) character, just like in Python. If we want to use this character when defining a color, we need to enclose it in quotation marks to indicate that it is a string instead.

YAML's syntax is somewhat similar to Python's, so it should be fairly easy to pick up.

In order to read YAML files, we will need an external package. We can once again use a virtual environment to manage these. Enter the following three commands in your terminal, ensuring that you are in the root folder at which you are writing your text editor:

python3 -m venv env
source env/bin/activate
pip install pyyaml

This will install the yaml package into your Python environment, which you can now import at the top of the highlighter.py file, like so:

import yaml

We are now ready to write ourselves a method which will read and parse a given .yaml or .yml file and convert it into a Python dictionary:

def parse_syntax_file(self):
with open(self.syntax_file, 'r') as stream:
try:
config = yaml.load(stream)
except yaml.YAMLError as error:
print(error)
return

If you have ever worked with opening files in Python before, this should seem very familiar to you.

We use the with keyword in order to open the file which we will have set in our syntax_file attribute. We open it in read mode because we don't need to make any changes to it.

Within a try: except block, we attempt to load the content of the file into our yaml module using its load method. The module will return a YAMLError exception if the file's syntax is incorrect, so we catch that with our except statement, print the error to the console, and return, preventing any highlighting from taking place.

Assuming our YAML file loads without any problems, we can begin reading from the config variable as if it were just a Python dictionary.

If you want to see what the config variable looks like, add a print(config) call after the with statement:

    self.categories = config['categories']
self.numbers_color = config['numbers']['color']
self.strings_color = config['strings']['color']

self.configure_tags()

Within our categories, we have stored different types of keywords, which we can assign different colors. We extract each category from our config dictionary and keep a record of it in our categories attribute. These contain the keyword patterns and colors, so there is no need to keep a separate reference to those anymore.

Our number and string colors are also found in the dictionary and stored in attributes, since this is the only information we need about these.

Now that we have extracted the information we need, it's time to configure our tags so they they have the ability to change the color of any matches.

In order to configure our tags properly, we must get the category name and color out of our categories attribute and pass them to the tag_configure method of our Text widget.

Our categories attribute is a dictionary of dictionaries; one entry will look as follows:

{
'keywords': {
'color': 'orange',
'matches': ['for', 'def', 'while', 'from', 'import', 'as',
'with', 'self']
}
}

To get the relevant information out of this data structure, we will need to iterate over the keys of our categories dictionary and use each key to access its inner color data. We then have the key as our tag name, and the color as its foreground color:

def configure_tags(self):
for category in self.categories.keys():
color = self.categories[category]['color']
self.text_widget.tag_configure(category, foreground=color)

self.text_widget.tag_configure("number", foreground=self.numbers_color)
self.text_widget.tag_configure("string", foreground=self.strings_color)

Using the keys method of our dictionary, we are able to iterate over each category and pass it back into the self.categories dictionary, along with the string color, to access each category's assigned color.

These two pieces of information are then passed to the tag_configure method to set up a matching tag.

We finish up the method by configuring tags for our numbers, and strings too. These are passed to the numbers_color and strings_color, which we extracted earlier.

In order to make this code run, we need to kick-off the chain of methods in our __init__ method. We will also need to receive a path to the YAML file to parse, which we will take as an argument.

Our __init__ method should now look like this:

def __init__(self, text_widget, syntax_file):
self.text_widget = text_widget
self.syntax_file = syntax_file
self.categories = None
self.numbers_color = "blue"
self.strings_color = "red"
self.disallowed_previous_chars = ["_", "-", "."]

self.parse_syntax_file()

self.text_widget.bind('<KeyRelease>', self.on_key_release

Now that our __init__ file has been taken care of, we still need to adjust our highlight method to allow it to create multiple tags, as it currently still only uses the keyword tag.

Luckily, only the beginning loop of this method will need altering:

def highlight(self, event=None):
length = tk.IntVar()
for category in self.categories:
matches = self.categories[category]['matches']
for keyword in matches:
start = 1.0
...

Everything after the ellipses (...) here can be left the same as before; it is mainly the outer loop which needs to change. We now iterate over each category in our categories attribute and grab its matches by passing it back in, along with the matches string.

All calls to tag_add should be using our category variable instead of the hard-coded keyword tag. Replace all calls to tag_add with this:

self.text_widget.tag_add(category, idx, end)

We finish up this class running this code as an independent module:

if __name__ == '__main__':
w = tk.Tk()
h = Highlighter(tk.Text(w), 'languages/python.yaml')
w.mainloop()

We can now launch this file with python3 highlighter.py and test it out. Try typing the words in your YAML file and watching them all change to their assigned color. Also try out numbers and strings. Perhaps copy the content of this file back into the window to see real Python code get highlighted:

Once we are satisfied that everything is working as intended, it's time to integrate this class back into our main TextEditor application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset