Chapter 20. The Goodies: Base and Standard Library

Julia comes with batteries included. The Base module contains the most useful functions, types, and macros. These are directly available in Julia.

Julia also provides a large number of specialized modules in its standard library for dates, distributed computing, linear algebra, profiling, random numbers, and more. Functions, types, and macros defined in the standard library have to be imported before they can be used:

  • import Module imports the module, and Module.fn(x) calls the function fn.

  • using Module imports all exported Module functions, types, and macros.

Additional functionality can be added from a growing collection of packages.

This chapter is not intended as a replacement of the official Julia documentation. The aim is merely to give some examples to illustrate what is possible, without being exhaustive. Functions already introduced elsewhere are not included.

Measuring Performance

We have seen that some algorithms perform better than others. The fibonnaci implementation from “Memos” is a lot faster than the fib implementation from “One More Example”. The @time macro allows us to quantify the difference:

julia> fib(1)
1
julia> fibonacci(1)
1
julia> @time fib(40)
  0.567546 seconds (5 allocations: 176 bytes)
102334155
julia> @time fibonacci(40)
  0.000012 seconds (8 allocations: 1.547 KiB)
102334155

@time prints the time the function took to execute, the number of allocations, and the allocated memory before returning the result. The memoized version is a lot faster but needs more memory.

There ain’t no such thing as a free lunch!

A function in Julia is compiled the first time it is executed. So, to compare two algorithms, they have to be implemented as functions to get compiled and the first time they are called has to be excluded from the performance measure; otherwise, the compilation time is measured as well.

The package BenchmarkTools provides the macro @btime that does benchmarking the right way. Use it!

Collections and Data Structures

In “Dictionary Subtraction” we used dictionaries to find the words that appear in a document but not in a word array. The function we wrote takes d1, which contains the words from the document as keys, and d2, which contains the array of words. It returns a dictionary that contains the keys from d1 that are not in d2:

function subtract(d1, d2)
    res = Dict()
    for key in keys(d1)
        if key  keys(d2)
            res[key] = nothing
        end
    end
    res
end

In all of these dictionaries, the values are nothing because we never use them. As a result, we waste some storage space.

Julia provides another built-in type, called a set, that behaves like a collection of dictionary keys with no values. Adding elements to a set is fast; so is checking membership. And sets provide functions and operators to compute common set operations.

For example, set subtraction is available as a function called setdiff. So we can rewrite subtract like this:

function subtract(d1, d2)
    setdiff(d1, d2)
end

The result is a set instead of a dictionary.

Some of the exercises in this book can be done concisely and efficiently with sets. For example, here is a solution to “Exercise 10-7” that uses a dictionary:

function hasduplicates(t)
    d = Dict()
    for x in t
        if x  d
            return true
        end
        d[x] = nothing
    end
    false
end

When an element appears for the first time, it is added to the dictionary. If the same element appears again, the function returns true.

Using sets, we can write the same function like this:

function hasduplicates(t)
    length(Set(t)) < length(t)
end

An element can only appear in a set once, so if an element in t appears more than once, the set will be smaller than t. If there are no duplicates, the set will be the same size as t.

We can also use sets to do some of the exercises in Chapter 9. For example, here’s a version of usesonly with a loop:

function usesonly(word, available)
    for letter in word
        if letter  available
            return false
        end
    end
    true
end

usesonly checks whether all the letters in word are in available. We can rewrite it like this:

function usesonly(word, available)
    Set(word)  Set(available)
end

The (subseteq TAB) operator checks whether one set is a subset of another, including the possibility that they are equal, which is true if all the letters in word appear in available.

Exercise 20-1

Rewrite avoids using sets.

Mathematics

Complex numbers are also supported in Julia. The global constant im is bound to the complex number i, representing the principal square root of -1.

We can now verify Euler’s identity:

julia> ^(im*π)+1
0.0 + 1.2246467991473532e-16im

The symbol (euler TAB) is the base of natural logarithms.

Let’s illustrate the complex nature of trigonometric functions:

cosx=eix+e-ix2

We can test this formula for different values of x:

julia> x = 0:0.1:2π
0.0:0.1:6.2
julia> cos.(x) == 0.5*(.^(im*x)+.^(-im*x))
true

Here, another example of the dot operator is shown. Julia also allows numeric literals to be juxtaposed with identifiers as coefficients, as in .

Strings

In Chapters 8 and 9, we did some elementary searches in string objects. Julia can also handle Perl-compatible regular expressions (regexes), which eases the task of finding complex patterns in string objects.

The usesonly function can be implemented as a regex:

function usesonly(word, available)
  r = Regex("[^$(available)]")
  !occursin(r, word)
end

The regex looks for a character that is not in the available string and occursin returns true if the pattern is found in word:

julia> usesonly("banana", "abn")
true
julia> usesonly("bananas", "abn")
false

Regexes can also be constructed as nonstandard string literals prefixed with r:

julia> match(r"[^abn]", "banana")

julia> m = match(r"[^abn]", "bananas")
RegexMatch("s")

String interpolation is not allowed in this case. The match function returns nothing if the pattern (a command) is not found and a RegexMatch object otherwise.

We can extract the following information from a RegexMatch object:

  • The entire substring matched (m.match)

  • The captured substrings as an array of strings (m.captures)

  • The offset at which the whole match begins (m.offset)

  • The offsets of the captured substrings as an array (m.offsets)

For example:

julia> m.match
"s"
julia> m.offset
7

Regexes are extremely powerful and the perlre mainpage provides all the details to construct the most exotic searches.

Arrays

In Chapter 10 we used an array object as a one-dimensional container with an index to address its elements. In Julia, however, arrays are multidimensional collections, or matrices.

Let’s create a 2-by-3 zero matrix:

julia> z = zeros(Float64, 2, 3)
2×3 Array{Float64,2}:
 0.0  0.0  0.0
 0.0  0.0  0.0
julia> typeof(z)
Array{Float64,2}

The type of this matrix is an array holding floating points and having two dimensions.

The size function returns a tuple with as elements the number of elements in each dimension:

julia> size(z)
(2, 3)

The function ones constructs a matrix with unit value elements:

julia> s = ones(String, 1, 3)
1×3 Array{String,2}:
 ""  ""  ""

The string unit element is an empty string.

s is not a one-dimensional array:

julia> s ==  ["", "", ""]
false

s is a row matrix and ["", "", ""] is a column matrix.

A matrix can be entered directly using a space to separate elements in a row and a semicolon (;) to separate rows:

julia> a = [1 2 3; 4 5 6]
2×3 Array{Int64,2}:
 1  2  3
 4  5  6

You can use square brackets to address individual elements:

julia> z[1,2] = 1
1
julia> z[2,3] = 1
1
julia> z
2×3 Array{Float64,2}:
 0.0  1.0  0.0
 0.0  0.0  1.0

Slices can be used for each dimension to select a subgroup of elements:

julia> u = z[:,2:end]
2×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0

The . operator broadcasts to all dimensions:

julia> .^(im*u)
2×2 Array{Complex{Float64},2}:
 0.540302+0.841471im       1.0+0.0im
      1.0+0.0im       0.540302+0.841471im

Interfaces

Julia specifies some informal interfaces to define behaviors—i.e., methods with a specific goal. When you extend such a method for a type, objects of that type can be used to build upon these behaviors.

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Looping over the values of a collection (iteration) is such an interface. In “One More Example” we implemented the fib function, returning the nth element of the Fibonacci sequence. Let’s make an iterator that lazily returns the Fibonacci sequence:

struct Fibonacci{T<:Real} end
Fibonacci(d::DataType) = d<:Real ? Fibonacci{d}() : error("No Real type!")

Base.iterate(::Fibonacci{T}) where {T<:Real} = (zero(T), (one(T), one(T)))
Base.iterate(::Fibonacci{T}, state::Tuple{T, T}) where {T<:Real} = (state[1], (state[2], state[1] + state[2]))

Here, we implemented a parametric type with no fields (Fibonacci), an outer constructor, and two iterate methods. The first is called to initialize the iterator and returns a tuple consisting of the first value, 0, and a state. The state in this case is a tuple containing the second and third values, 1 and 1.

The second method is called to get the next value of the Fibonacci sequence. It returns a tuple having as its first element the next value and as its second element the state, which is a tuple with the two following values.

We can use Fibonacci now in a for loop:

julia> for e in Fibonacci(Int64)
           e > 100 && break
           print(e, " ")
       end
0 1 1 2 3 5 8 13 21 34 55 89

It looks like magic has happened, but the explanation is simple. A for loop in Julia:

for i in iter
    # body
end

is translated into:

next = iterate(iter)
while next !== nothing
    (i, state) = next
    # body
    next = iterate(iter, state)
end

This is a great example of how a well-defined interface allows an implementation to use all the functions that are aware of the interface.

Interactive Utilities

We already met the InteractiveUtils module, in “Debugging”. But the @which macro is only the tip of the iceberg.

The LLVM library transforms Julia code into machine code—instructions that the computer’s CPU can execute directly—in multiple steps. We can directly visualize the output of each stage.

Let’s take a look at a simple example:

function squaresum(a::Float64, b::Float64)
    a^2 + b^2
end

The first step is to look at the “lowered” code:

julia> using InteractiveUtils

julia> @code_lowered squaresum(3.0, 4.0)
CodeInfo(
1 ─ %1 = (Core.apply_type)(Base.Val, 2)
│   %2 = (%1)()
│   %3 = (Base.literal_pow)(:^, a, %2)
│   %4 = (Core.apply_type)(Base.Val, 2)
│   %5 = (%4)()
│   %6 = (Base.literal_pow)(:^, b, %5)
│   %7 = %3 + %6
└──      return %7
)

The @code_lowered macro returns an array of an intermediate representation of the code that is used by the compiler to generate optimized code.

The next step adds type information:

julia> @code_typed squaresum(3.0, 4.0)
CodeInfo(
1 ─ %1 = (Base.mul_float)(a, a)::Float64
│   %2 = (Base.mul_float)(b, b)::Float64
│   %3 = (Base.add_float)(%1, %2)::Float64
└──      return %3
) => Float64

We see that the type of the intermediate results and the return value are correctly inferred.

This representation of the code is now transformed into LLVM code:

julia> @code_llvm squaresum(3.0, 4.0)
;  @ none:2 within `squaresum'
define double @julia_squaresum_14823(double, double) {
top:
; ┌ @ intfuncs.jl:243 within `literal_pow'
; │┌ @ float.jl:399 within `*'
    %2 = fmul double %0, %0
    %3 = fmul double %1, %1
; └└
; ┌ @ float.jl:395 within `+'
   %4 = fadd double %2, %3
; └
  ret double %4
}

And finally the machine code is generated:

julia> @code_native squaresum(3.0, 4.0)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ none:2 within `squaresum'
; │┌ @ intfuncs.jl:243 within `literal_pow'
; ││┌ @ none:2 within `*'
	vmulsd	%xmm0, %xmm0, %xmm0
	vmulsd	%xmm1, %xmm1, %xmm1
; │└└
; │┌ @ float.jl:395 within `+'
	vaddsd	%xmm1, %xmm0, %xmm0
; │└
	retl
	nopl	(%eax)
; └

Debugging

The Logging macros provide an alternative to scaffolding with print statements:

julia> @warn "Abandon printf debugging, all ye who enter here!"
┌ Warning: Abandon printf debugging, all ye who enter here!
└ @ Main REPL[1]:1

The debug statements don’t have to be removed from the source. For example, in contrast to the @warn above, this will produce no output by default:

julia> @debug "The sum of some values $(sum(rand(100)))"

In this case, sum(rand(100)) will never be evaluated unless debug logging is enabled to store the debug messages in a log file.

The level of logging can be selected by an environment variable, JULIA_DEBUG:

$ JULIA_DEBUG=all julia -e '@debug "The sum of some values $(sum(rand(100)))"'
┌ Debug: The sum of some values 47.116520814555024
└ @ Main none:1

Here, we have used all to get all debug information, but you can also choose to generate only output for a specific file or module.

Glossary

set

A collection of distinct objects.

regex

A regular expression, or sequence of characters that define a search pattern.

matrix

A two-dimensional array.

machine code

Language instructions that can be executed directly by a computer’s central processing unit.

intermediate representation

A data structure used internally by a compiler to represent source code.

debug logging

Storing debug messages in a log.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset