Exploring Pure Sorting in Python


 

natsort
Picture by Creator
 

What Is Pure Sorting, And Why Do We Want It?

 

When working with Python iterables resembling lists, sorting is a typical operation you’ll carry out. To kind lists you should utilize the listing technique kind() to kind an inventory in place or the sorted() operate that returns a sorted listing.

The sorted() operate works high quality when you’ve an inventory of numbers or strings containing letters. However what about strings containing alphanumeric characters, resembling filenames, listing names, model numbers, and extra? The sorted() operate performs lexicographic sorting.

Take a look at this easy instance:

# Listing of filenames
filenames = ["file10.txt", "file2.txt", "file1.txt"]

sorted_filenames = sorted(filenames)
print(sorted_filenames)

 

You may get the next output:

Output >>> ['file1.txt', 'file10.txt', 'file2.txt']

 

Properly, ‘file10.txt’ comes earlier than ‘file2.txt’ within the output. Not the intuitive sorting order we’re hoping for. It is because the sorted() operate makes use of the ASCII values of the characters to kind and never the numeric values. Enter pure sorting.

Pure sorting is a sorting method that arranges parts in a means that displays their pure order, notably for alphanumeric information. In contrast to lexicographic sorting, pure sorting interprets the numerical worth of digits inside strings and arranges them accordingly, leading to a extra significant and anticipated sequence.

On this tutorial, we’ll discover pure sorting with the Python library natsort.

 

Getting Began

 

To get began, you possibly can set up the natsort library utilizing pip:

 

As a greatest observe, set up the required package deal in a digital setting for the mission. As a result of natsort requires Python 3.7 or later, be sure to’re utilizing a latest Python model, ideally Python 3.11 or later. To learn to handle totally different Python variations, learn Too Many Python Variations to Handle? Pyenv to the Rescue.

 

Pure Sorting Primary Examples

 
We’ll begin with easy use instances the place pure sorting is useful:

  • Sorting file names: When working with file names containing digits, pure sorting ensures that information are ordered within the pure intuitive order.
  • Model sorting: Pure sorting can be useful for ordering strings of model numbers, guaranteeing that variations are sorted based mostly on their numerical values fairly than their ASCII values. Which could not replicate the specified versioning sequence.

Now let’s proceed to code these examples.

 

Sorting Filenames

 
Now that we’ve put in the natsort library, we will import it into our Python script and use the totally different features that the library provides.

Let’s revisit the primary instance of sorting file names (the one we noticed initially of the tutorial) the place the lexicographic sorting with the operate was not what we wished.

Now let’s kind the identical listing utilizing the natsorted() operate like so:

import natsort

# Listing of filenames
filenames = ["file10.txt", "file2.txt", "file1.txt"]

# Type filenames naturally
sorted_filenames = natsort.natsorted(filenames)
print(sorted_filenames)

 

On this instance, natsorted() operate from the natsort library is used to kind the listing of file names naturally. Because of this, the file names are organized within the anticipated numerical order:

Output >>> ['file1.txt', 'file2.txt', 'file10.txt']

 

Sorting Model Numbers

 
Let’s take one other related instance the place we now have strings denoting variations:

import natsort

# Listing of model numbers
variations = ["v-1.10", "v-1.2", "v-1.5"]

# Type variations naturally
sorted_versions = natsort.natsorted(variations)

print(sorted_versions)

 

Right here, the natsorted() operate is utilized to kind the listing of model numbers naturally. The ensuing sorted listing maintains the right numerical order of the variations:

Output >>> ['v-1.2', 'v-1.5', 'v-1.10']

 

Customizing Sorting with a Key

 

When utilizing the built-in sorted() operate, you may need used the key parameter to customise. Equally, the sorted() operate additionally takes the optionally available key parameter which you should utilize to kind based mostly on particular standards.

Let’s take an instance: we now have file_data which is the listing of tuples. The primary ingredient within the tuple (at index 0) is the file identify and the second merchandise (at index 1) is the dimensions of the file.

Say we wish to kind based mostly on the file measurement in ascending order. So we set the key parameter to lambda x: x[1] in order that the file measurement at index 1 is used because the sorting key:

import natsort

# Listing of tuples containing filename and measurement
file_data = [
("data_20230101_080000.csv", 100),
("data_20221231_235959.csv", 150),
("data_20230201_120000.csv", 120),
("data_20230115_093000.csv", 80)
]

# Type file information based mostly on file measurement
sorted_file_data = natsort.natsorted(file_data, key=lambda x:x[1])

# Print sorted file information
for filename, measurement in sorted_file_data:
    print(filename, measurement)

 

Right here’s the output:

data_20230115_093000.csv 80
data_20230101_080000.csv 100
data_20230201_120000.csv 120
data_20221231_235959.csv 150

 

Case-Insensitive Sorting of Strings

 

One other use case the place pure sorting is useful is whenever you want case-insensitive sorting of strings. Once more the lexicographic sorting based mostly on ASCII values won’t give the specified outcomes.

To carry out case-insensitive sorting, we will set alg to natsort.ns.IGNORECASE which is able to ignore the case when sorting. The alg key controls the algorithm that natsorted() makes use of:

import natsort

# Listing of strings with blended case
phrases = ["apple", "Banana", "cat", "Dog", "Elephant"]

# Type phrases naturally with case-insensitivity
sorted_words = natsort.natsorted(phrases, alg=natsort.ns.IGNORECASE)

print(sorted_words)

 

Right here, the listing of phrases with blended case is sorted naturally with case-insensitivity:

Output >>> ['apple', 'Banana', 'cat', 'Dog', 'Elephant']

 

Wrapping Up

 

And that is a wrap! On this tutorial, we reviewed the restrictions of lexicographic sorting and the way pure sorting could be a good various when working with alphanumeric strings. You will discover all of the code on GitHub.

We began with easy examples and likewise checked out sorting based mostly on customized keys and dealing with case-insensitive sorting in Python. Subsequent, you could discover different capabilities of the natsort library. I’ll see you all quickly in one other Python tutorial. Till then, preserve coding!

 

 

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embrace DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and low! At the moment, she’s engaged on studying and sharing her data with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.



Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *