CHAPTER 5
What Is AI?

“A creature void of form”

Bob Dylan

In this section of the book, I will give you the core knowledge you need to understand AI. Before we jump into AI theory, we start off with a discussion on what AI is, then compute power, which is as important as the AI itself, and then go into the history of AI and optimization. These chapters will give you the necessary prerequisites to learn AI. Then you will be ready to go deep into AI theory. Let's dive in.

Since John McCarthy and Marvin Minsky coined the term in 1956 at the Dartmouth Summer Research Project on Artificial Intelligence (DSRPAI), many over the past 66 years have tried to give airtight definitions of what is and what is not considered AI.

Short answer: When forced to give an answer, I typically say: “AI is any non‐trivial computer program that solves a complex problem.” This is too loose of a definition in my opinion, but I can think of a better definition that would clearly classify things I deem AI and things I do not. Let me defend this definition a bit more.

Long answer: To answer what is artificial intelligence you must first define intelligence.

Let's answer this question from a few different perspectives.

Intelligence

  1. Webster: First perspective, Webster's definition of intelligence is: The ability to acquire and apply knowledge and skills.

    So with this definition, is a worm intelligent? Is my dog? Is Steve‐o from Jackass? Is Einstein? Which has this magical “ability,” and which do not? I believe all of these examples have this ability, but some have way more of it than others. So intelligence is just like height: it is not something you have or not; it is a scale from zero (or no ability to acquire and apply knowledge and skills) to infinity. Well if it is a scale, can it be measured like height?

  2. Psychometrics: Now for our second perspective. In the psychological literature, there are no less than 70 divergent answers to this question. This branch of psychology that deals with psychological measurement to try to quantify knowledge, abilities, attitudes, and personality traits, according to the American Psychological Association (APA) is called psychometrics. The most predominant model for trying to quantify human intelligence is called the CHC (Cattell–Horn–Carroll) model (or the three‐stratum theory), which breaks out three levels of cognition: narrow abilities (stratum), broad abilities (stratum II), and general abilities (stratum III) (see Figure 5.1).
    Schematic illustration of Cattel-Horn-Carroll (CHC) model.

    FIGURE 5.1 Cattel‐Horn‐Carroll (CHC) model.

    Source: ParanoidLemmings / https://da.wikipedia.org/wiki/Tre-stratumteorien / last accessed December 19, 2022 / Public Domain CC BY 4.0.

    • Stratum I (narrow level): very specific factors and skills, such as adding and subtracting, basic reading and writing skills.
    • Stratum II (broad abilities): eight broad abilities—fluid intelligence, crystallized intelligence, general memory and learning, broad visual perception, broad auditory perception, broad retrieval ability, broad cognitive speediness, and processing speed.
    • Stratum III (general intelligence): g factor, accounts for the correlations among the broad abilities at Stratum II. It is a single‐value measure of general intelligence or generalizability of a person, e.g., 1530 on the SAT.

      This is the basis of almost all standardized testing that you may have taken in high school, such as the GRE, the SAT, and the ACT. The g factor is estimated from these tests. They are designed to measure how well you generalize on previously unseen problems and the generalization difficulty of the unseen problem. They presume you have a high school level of education and know about algebra, how to read, and write. Assuming you know those things, these adaptive tests continue to push the generalization difficulty of the problems you are given as you get more and more of them right until the test is over. We introduced this topic of “generalizability” before, but let's provide a clear definition now.

  3. Generalizability: the ability to handle new, unseen situations (or tasks) that differ from previously encountered situations.
  4. AI/Computer Science: Now for the third perspective on intelligence, computer science. Many famous computer scientists have discussed this topic from Minsky to Turing. But one of my favorite, more modern AI researchers, Francois Chollet, wrote a recent paper called “On the Measure of Intelligence” that I believe to be the best modern discussion on the topic:

The intelligence of a system is a measure of its skill‐acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty. Intelligence is the efficiency with which a learning system turns experience and priors into skill at previously unknown tasks. As such, a measure of intelligence must account for priors, experience, and generalization difficulty. All intelligence is relative to a scope of application. Two intelligent systems may only be meaningfully compared within a shared scope and if they share similar priors.

There are a few important notes to take from this.

The first is the term “efficiency.” This is the speed at which one converts an experience into a new skill. How many examples does it take for you to learn a new skill? A missing concept in both the Webster definition (“The ability to acquire and apply knowledge and skills”) and the definition of generalizability (“The ability to handle situations [or tasks] that differ from previously encountered situations”) is the concept of how quickly a learning system turns experience and priors into a new skill at previously unknown tasks. A smart person or algorithm or dog may only have to touch the hot stove once to know to never do this again. A less intelligent person or algorithm or dog may continue to do so hundreds of times before they learn their lesson. While both demonstrated the ability, one clearly has more intelligence. Without accounting for the efficiency with which we learn, we cannot clearly measure the difference in intelligence between two agents.

The second is similar priors. By normalizing from a similar knowledge base before you administer an exam, you can more tightly measure differences in intelligence between two agents.

If you had to compare the intelligence of two agents that both start with equal prior knowledge, and one is able to generalize to more new problems quicker than the other (with more efficiency), then we would say that agent is more intelligent.

I believe the concepts of similar priors and efficiency are lacking in standardized testing today. If we are trying to compare test scores from two high school students with the following attributes:

  • One student had a private tutor their entire lives, went to the best preschool, grade school, and high school, studied for the SAT for 1,000 hours, grew up in a stable household without distraction or worry of starvation or homelessness or abuse;
  • Another went to the worst elementary school, grade school, and high school, grew up homeless, never studied for the SAT, and required to work directly after school each day instead of studying.

    If both of these students scored the same on the SAT, which would you believe to be more intelligent? I would for sure choose the latter. This shows far greater generalization.

Now with our new understanding of intelligence, let's go back to chess, for example. Let's apply our new understanding of generalizability of the AI that beat Gary Kasparov.

Importance of Generalizability

Humankind has long held chess as a very difficult game, and the best players in the world must clearly hold remarkable intelligence. However (and I hope you are sitting down for this), as defined by our Webster definition of intelligence, the program that beat Gary Kasparov had no intelligence! It had zero ability to acquire any knowledge and skill. It was just a static program put onto a massive computer that performed a simple tree search and minimax operation at each step that human engineers tweaked with their experience of chess. They just added a ton of compute power and human knowledge into it to brute‐force search for the best answer. There was no learning at all. Instead, the developers encoded their knowledge into the system. If you changed the rules of chess even one bit (i.e., allow pawns to go backward, or rearrange the starting positions, or let the queen not go diagonal anymore, or able to move twice every turn, aka double‐move chess), it would topple over and never recover unless the programmers changed the program meaningfully. This is why no broad AI ability or product came out of IBM's Deep Blue. The researchers were left with the shocking realization that building an AI chess champion did not require any generalizability or meaningful intelligence at all. The AI they built could not even generalize to other similar board games without meaningful tweaks and customizations. I personally do believe this program should count as “AI,” which is why I do not agree with the Webster definition and prefer my definition.

Now let's compare that to my 14‐month‐old daughter. Last week, I was playing tennis with my wife. My daughter came over to me with a bottle of water and accidentally spilled it on the floor. She looked at me, went over to the seat and picked up a towel, walked all the way back over and started cleaning up the water, as seen in Figure 5.2.

Photograph of my 14-month-old daughter drying a water spill.

FIGURE 5.2 My 14‐month‐old daughter drying a water spill.

I was stunned. I certainly have never shown her any examples of such behavior. If I were to try to get a robot to do this, it would easily take a team of five very good engineers a few years to do it, and it would not do it nearly as well, with such robustness. The generalizability of humans is truly remarkable, and its bounds are still very much unknown.

This is an important theme. What is easy for us to figure out, like cleaning up a spill, is remarkably difficult for our AI models to figure out today. And what is difficult for us may be quite simple for AI to do.

Given our “limited hardware” (our brain), with limited memory and limited processing speed, chess is quite difficult. You have to model numerous scenarios and remember all of them while you are making your choice. It does show intelligence in humans, but it does not necessarily measure intelligence if given to a computer with unlimited memory and unlimited processing speed.

More to this point, why couldn't this same technique that was used to beat the world's chess champion be used to beat the world's go champion? Why couldn't it generalize? Just add more compute, right? Well, we did not and still do not have unlimited memory and unlimited processing speed in our computers. While it is increasing every year, we have limitations. As we discussed in the introduction section, we will never have enough memory or processing ability in our computers to brute‐force search such a large number of possibilities, so we needed an AI model that can learn where to focus its attention and only run tree searches on those positions rather than all. Using that old technique, there are not enough computers in the world to make that work. But this new technique is much more efficient. Also as this new technique can be used to learn almost any game, this makes Alpha Go meaningfully more generalizable (and therefore more intelligent) than Deep Blue. However, Alpha Go was not possible with the hardware that was available in 1997. Not until Nvidia produced powerful enough GPUs were we able to make such a model work. This brings me to our next topic, compute!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset