Days Seven and Eight of a Predictive Coding Narrative: Where I have another hybrid mind-meld and discover that the computer does not know God

July 29, 2012

This is my fifth in a series of narrative descriptions of  a search of 699,082 Enron emails to find evidence on involuntary employee terminations. The preceding narratives are:

In this fifth installment I will continue my description, this time covering days seven and eight of the project. As the title indicates, progress continues and I have another hybrid mind-meld moment. I also discover that the computer does not recognize the significance of references to God in an email. This makes sense logically, but is unexpected and kind of funny when encountered in a document review.

Seventh Day of Review (7 Hours)

this seventh day I followed Joe White’s advice as described at the end of the last narrative. It was essentially a three-step process:

One: I ran another learning session for the dozen or so I’d marked since the last one to be sure I was caught up, and then made sure all of the prior Training documents were checked back in. This only took a few minutes.

Two: I ran two more focus document trainings of 100 docs each, total 200. The focus documents are generated automatically by the computer. It only took about an hour to review these 200 documents because most were obviously irrelevant to me, even if the computer was somewhat confused.

I received more of an explanation from Joe White on the focus documents, as Inview calls them. He explains that, at the current time at least (KO is coming out with a new version of the Inview software soon, and they are in a state of constant analysis and improvement), 90% of each focus group consists of grey area type documents, and 10% are pure random under IRT ranking. For documents drawn via workflow (in the demo database they are drawn from the System Trainers group in the Document Training Stage) they are selected as 90% focus and 10% random; where the 90% focus selection is drawn evenly across each category set for iC training.

The focus documents come from the areas of least certainty for the algorithm. A similar effect can be achieved by searching for a given iC category for documents between 49 – 51%, etc., as I had done before for relevance. But the automated focus document system makes it a little easier because it knows when you do not have enough documents in the 49 – 51% probability range and then increases the draw to reach your specified number, here 100,  to the next least-certain documents. This reduces the manual work in finding the grey area documents for review and training.

Three: I looked for more documents to evaluate/train the system. I had noticed that “severance” was a key word in relevant documents, and so went back and ran a search for this term for the first time. There were 3,222 hits, so, as per my standard procedure, I added this document count to name of the folder that automatically saved the search.

I found many more relevant documents that way. Some were of a new type I had not seen before (having to do with the mass lay-offs when Enron was going under), so I knew I was expanding the scope of relevancy training, as was my intent. I did the judgmental review by using various sort-type judgment searches in that folder, i.e. by ordering the documents by subject line, file type, search terms hits (the star symbols), etc., and did not review all 3,222 docs. I did not find that necessary. Instead, I honed in on the relevant docs, but also marked some irrelevant ones here that were close. Below is a screen shot of the first page of the documents sorted by putting those selected for training at the top.

I had also noticed that “lay off” “lay offs” and “laid off” were common terms found in relevant docs, and I had not searched for those particular terms before either. There were 973 documents with hits with one of these search terms. I did the same kind of judgmental search of the folder I created with these documents and found more relevant documents to train. Again, I was finding new documents and knew that I was expanding the scope of relevancy. Below is one new relevant document found in this selection; note how the search terms are highlighted for easy location.

I also took the time to mark some irrelevant documents in these new search folders, especially the documents in the last folder, and told them to train too, since they were otherwise close from a similar keywords perspective. So I thought I should go ahead and train them to try to teach the fine distinctions.

The above third step took another five hours (six hours total). I knew I had added hundreds of new docs for training in the past five hours, both relevant and irrelevant.

Fourth Round

I decided it was time to run a training session again and force the software to analyze and rank all of the documents again. This was essentially the Fourth Round (not counting the little training I did at the beginning today to make sure I was served with the right (updated) Focus documents).

After the Training session completed, I asked for a report. It showed that 2,663 total documents (19,731 pages) have now been categorized and marked for Training in this last session. There were now 1,156 Trainer (me) identified documents, plus the original 1,507 System ID’ed docs. (Previously, in Round 3, there were the same 1,507 System ID’ed docs, and only 534 Trainer ID’ed docs.)

Then I ran a report to see how many docs had been categorized by me as Relevant (whether also marked for Training or not). Note I could have done this before the training session too, and it would not make any difference in results. All the training session does is change the predictions on coding, not the actual prior human coding. This relevancy search was saved in another search folder called “All Docs Marked Relevant after 4th Round – 355 Docs.” After the third round I had only ID’ed 137 relevant documents. So progress in recall was being made.

Prevalence Quality Control Check

As explained in detail in Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane, my first random sample search allowed me to determine prevalence and get an idea of the total number of relevant document likely contained in the database. The number was 928 documents. That was the spot or point projection of the total yield in the corpus. (Yield is another information science and statistics term that is useful to know. It means in this context the expected number of relevant documents in the total database. See eg. Webber, W., Approximate Recall Confidence Intervals, ACM Transactions on Information Systems, Vol. V, No. N, Article A (2012 draft) at A2.)

My yield calculation here of 928 is based on my earlier finding of 2 relevant documents in the initial 1,507 random sample. (2/1507=.00132714) (.13*699,082=928 relevant documents). So based on this I knew that I was correct to have gone ahead with the fourth round, and would next check to see how many documents the IRT now predicted would be relevant. My hope was the number would now be closer to the 928 goal of the projected yield of the 699,082 document corpus.

This last part had taken another hour, so I’ll end Day Seven with a total of 7 hours of search and review work.

Eighth Day of Review (9 Hours)

First I ran a probability search as before for all 51%+ probable relevant docs and saved them in a folder by that name. After the fourth round the IRC now predicted a total of 423 relevant documents. Remember I had already actually reviewed and categorized 355 docs as relevant, so this was only a potential max net gain of 68 docs. As it turned out, I disagreed with 8 of the predictions, so the actual net gain was only 60 docs, for a total of 415 confirmed relevant documents.

I had hoped for more after broadening the scope of documents marked relevant in the last seeding. So I was a little disappointed that my last seed set had not led to more predicted relevant. Since the “recall goal” for this project was 928 documents, I knew I still had some work to do to expand the scope. Either that or the confidence interval was at work, and there were actually fewer relevant documents in this collection than the random sample predicted as a point projection. The probability statistics showed that the actual range was between 112 documents 3,345 documents, due to the 95% confidence level and +/-3% confidence interval.

51%+ Probable Relevant Documents

Next I looked at the 51%+ probable relevant docs folder and sorted by whether the documents had been categorized on not. You do that by clicking on the symbol for categorization, a check, which is by default located in the upper left. That puts all of the categorized docs together, either on top or bottom. Then I reviewed the 68 new documents, the ones the computer predicted to be relevant that I had not previously marked relevant.

This is always the part of the review that is the most informative for me as to whether the computer is actually “getting-it” or not. You look to see what documents it gets wrong, in other words, makes a wrong prediction of probable relevance, and try to determine why. In this way you can be alert for additional documents to try to correct the error in future seeds. You learn from the computer’s mistakes where additional training is required.

I then had some moderately good news in my review. I only disagreed with eight of the 68 new predictions. One of these documents only had a 52.6% probability for relevance, another 53.6%, another 54.5%, another 54%, another 57.9%, and another other only 61%.  Another two were 79.2% and 76.7% having to do with “voluntary” severance again, a mistake I had seen before. So even when the computer and I disagreed, it was not by much.

Computer Finds New Hard-to-Detect Relevant Documents

A couple of the documents that Inview predicted to be relevant were long, many pages, so my study and analysis of them took a while. Even though these long documents at first seemed irrelevant to me, as I kept reading and analyzing them, I ultimately agreed with the computer on all of them. A careful reading of the documents showed that they did in fact include discussion related to termination and terminated employees. I was surprised to see that, but pleased, as it showed the software mojo was kicking in. The predictive coding training was allowing the computer to find documents I would likely never have caught on my own. The mind-meld was working and hybrid power was again manifest.

These hard to detect issues (for me) mainly arose from the unusual situation of the mass terminations that came at the end of Enron, especially at the time of its bankruptcy. To be honest, I had forgotten about those events. My recollection of Enron history was pretty rusty when I started this project. I had not been searching for bankruptcy related terminations before. That was entirely the computer’s contribution and it was a good one.

From this study of the 68 new docs I realized that although there were still some issues with the software making an accurate distinction between voluntary and involuntary severance, overall, I felt pretty confident that Inview was now pretty well-trained. I based that on the 60 other predictions that were spot on.

Note that I marked most of the newly confirmed relevant documents for training, but not all. I did not want to excessively weight the training with some that were redundant, or odd for one reason or another, and thus not particularly instructive.

This work was fairly time-consuming. It took three long hours on a Sunday to complete.

Fifth Round

Returning to work in the evening I started another training session, the Fifth. This would allow the new teaching (document training instructions) to take effect.

My plan was to then have the computer serve me up the 100 close calls (Focus Documents) by using the document training Checkout feature. Remember this feature selects and serves up for review the grey area docs designed to improve the IRT training, plus random samples.

But before I reviewed the next training set, I did a quick search to see how many new relevant documents (51%+) the last training (fifth round) has predicted. I found a total of 545 documents 51%+ predicted relevant. Remember I left the last session with 415 relevant docs (goal is 928). So progress was still being made. The computer had added 130 documents.

Review of Focus Documents

Before I looked at these new ones to see how many I agreed with, I stuck to my plan, and took a Checkout feed of 100 Focus documents. My guess is that most of the newly predicted 51%+ relevant docs would be in the grey area anyway, and so I’ll be reviewing some of them when I reviewed the Focus documents.

First, I noticed right away that it served up 35 irrelevant junk files that were obviously irrelevant and previously marked as such, such as PST placeholder files, and a few others like that, which clutter this ENRON dataset. Obviously, they were part of the random selection part of the Focus document selections. I told them all to train in one bulk command, hit the completed review button for them, and then focused on the remaining 65 documents. None had been reviewed before. Next I found some more obviously irrelevant docs, which were not close at all, i.e. 91% irrelevant and only 1% likely relevant. I suspect this is part of the general database random selection that makes up 10% of the Focus documents (the other 90% are close calls).

Next I did a file type sort to see if any more of the unreviewed documents in this batch of 100 were obviously irrelevant based on file type. I found 8 more such files, mass categorized them, mass trained them and quickly completed review for these 8.

Now there were 57 docs left, 9 of which were Word docs, and the rest emails. So I checked the 9 word docs next. Six of these were essentially the same document called “11 15 01 CALL.doc.” The computer gave each approximately a 32.3% probability of irrelevance and a 33.7% probability of relevance. Very close indeed. Some of the other docs had very slight prediction numbers (less than 1%). The documents proved to be very close calls. Most of them I found to be irrelevant. But in one document I found a comment about mass employee layoffs, so I decided to call it relevant to our issue of employee terminations. I trained those eight and checked them back in. I then reviewed the remaining word docs, found that they were also very close, but marked these as irrelevant and checked them in, leaving 48 docs left to review in the Training set of 100.

Next I noticed a junk kind of mass email from a sender called “Black.” I sorted by “From” found six by Black, and a quick look showed they were all irrelevant, as the computer had predicted for each. Not sure why they were picked as focus docs, but regardless, I trained them and checked them back in, now leaving 42 docs to review.

Next I sorted the remaining by “Subject” to look for some more that I might be able to quickly bulk code (mass categorize). It did not help much as there were only a couple of strings with the same subject. But I kept that subject order and sloughed through the remaining 42 docs.

I found most of the remaining docs were very close calls, all in the 30% range for both relevant and irrelevant. So they were all uncertain, i.w. a split choice, but none were actually predicted relevant, that is, none were in the over 50% likely relevant range. I found that most of them were indeed irrelevant, but not all. A few in this uncertain range were relevant. They were barely relevant, but of the new type recently marked having to do with the bankruptcy. Others that I found relevant were of a type I had seen before, yet the computer was still unsure with basically an even split of prediction in the 30% range. They were apparently different from the obviously relevant documents, but in a subtle way. I was not sure why. See Eg: control number 12509498.

It was 32.8% relevant and 30.9% irrelevant, even though I had marked an identical version of this email before as relevant in the last training. The computer was apparently suspicious of my prior call and was making sure. I know I’m anthropomorphizing a machine, but I don’t know how else to describe it.

Computer’s Focus Was Too Myopic To See God

One of the focus documents that the computer found a close call in the 30% range was email with control number 10910388. It was obviously just an inspirational message being forwarded around about God. You know the type I’m sure.

It was kind of funny to see that this email confused the computer, whereas any human could immediately recognize that this was a message about God, not employee terminations. It was obvious that the computer did not know God.

Suddenly My Prayers Are Answered

Right after the funny God mistake email, I reviewed another email with control number 6004505. It was about wanting to fire a particular employee. Although the computer was uncertain about the relevancy of this document, I knew right away that it rocked. It was just the kind of evidence I had been looking for. I marked it as Highly Relevant, the first hot document found in several sessions. Here is the email.

I took this discovery of a hot doc as a good sign. I was finding both the original documents I had been looking for and the new outliers. It looked to me like I had succeeded in training and in broadening the scope of relevancy to its proper breadth. I might not be travelling a divine road to redemption, but it was clearly leading to better recall.

Since most of these last 42 documents were all close questions (some were part of the 10% random and were obvious), the review took longer than usual. The above tasks all took over 1.5 hours (not including machine search time or time to write this memo).

Good Job Robot!

My next task was to review the 51% predicted relevant set of 545 docs. One document was particularly interesting, control number 12004849, which was predicted to be 54.7% likely relevant. I had previously marked it Irrelevant based on my close call decision that it only pertained to voluntary terminations, not involuntary terminations. It was an ERISA document, a Summary Plan Description of the Enron Metals Voluntary Separation Program.

Since the document on its face obviously pertained to voluntary separations, it was not relevant. That was my original thinking and why I at first called it Irrelevant. But my views on document characterizations on that fuzzy line between voluntary and involuntary employee terminations had changed somewhat over the course of the review project. I now had a better understanding of the underlying facts. The document necessarily defined both eligibility for this benefit, money when an employee left, and ineligibility. It specifically stated that employees of certain Enron entities were ineligible for this benefit. It stated that acceptance of an application was strictly within the company’s discretion. What happened if even an eligible employee decided not to voluntarily quit and take this money? Would they not then be terminated involuntarily? What happened if they applied for this severance, and the company said no? For all these reasons, and more, I decided that this document was in fact relevant to both voluntary and involuntary terminations. The relevance to involuntary terminations was indirect, and perhaps a bit of a stretch, but in my mind it was in the scope of a relevant document.

Bottom line, I had changed my mind and I now agreed with the computer and considered it Relevant. So I changed the coding to relevant and trained on it. Good call Inview. It had noticed an inconsistency with some of my other document codings and suggested a correction. I agreed. That was impressive. Good robot!

Looking at the New 51%+

Another one of the new documents that was in the 51%+ predicted relevant group was a document with 42 versions of itself. It was the Ken Lay email where he announced that he was not accepting his sixty-million dollar golden parachute. (Can you imagine how many law suits would have ensued if he took that money?) Here is one of the many copies of this email.

I had previously marked a version of this email as relevant in past rounds. Obviously the corpus (the 699,082 Enron emails) had more copies of that particular email that I had not found before. It was widely circulated. I confirmed the predictions of Relevance.  (Remember that this database was deduplicated only on the individual custodian basis, vertical deduplication. It was not globally deduplicated against all custodians, horizontal deduplication. I recommend full horizontal deduplication as a default protocol.)

I disagreed with many of the other predicted relevant docs, but did not consider any of them important. The documents now presenting as possibly relevant were, in my view, cumulative and not really new, not really important. All were fetched by the outer limits of relevance triggered by my previously allowing in as barely relevant the final day comments on Ken Lay’s not taking a sixty-million dollar payment, and also allowing in as relevant general talk during bankruptcy that might mention layoffs.

Also, I was allowing in as relevant new documents and emails that concerned the ERISA plan revisions that were related to general severance. The SPD of the Enron Metals Voluntary Separation Program was an example of that. These were all fairly far afield of my original concept of relevance, which had grown as I saw all of the final days emails regarding layoffs, and better understood the bankruptcy and ERISA set up, etc.

Bottom line, I did not see much training value in these newly added docs, both predicted and confirmed. The new documents were not really new. They were very close to documents already found in the prior rounds. I was thinking it might be time to bring this search to an end.

Latest Relevancy Metrics

I ran one final search to determine my total relevant coded documents. The count was 659. That was a good increase over the last measured count of 545 relevant, but still short of my initial goal of 928, the point projection of yield. That is a 71% recall (659/928) of my target, which is pretty good, especially if the remaining relevant were just cumulative or otherwise not important. Considering the 3% confidence interval, and the range inherent in the 928 yield point projection because of that, from between 112 and 3,345 documents, it could in fact already be 100% recall, although I doubted that based on the process to date. See references to point projection, intervals, and William Webber’s work on confidence intervals in Day Two of a Predictive Coding Narrative: More Than A Random Stroll Down Memory Lane and in Webber, W., Approximate Recall Confidence Intervals, ACM Transactions on Information Systems, Vol. V, No. N, Article A (2012 draft).

Enough Is Enough

I was pretty sure that further rounds of search would lead to the discovery of more relevant documents, but thought it very unlikely that any more significant relevant documents would be found. Although I had found one hot doc in this round, the quality of the rest of the documents found convinced me that was unlikely to occur again. I had the same reaction to the grey area documents. The quality had changed. Based on what I had been seeing in the last two rounds, the relevant documents left were, in my opinion, likely cumulative and of no real probative value to the case.

In other words, I did not see value in continuing the search and review process further, except for a final null-set quality control check. I decided to bring the search to end. Enough is enough already. Reasonable efforts are required, not perfection. Besides, I knew there was a final quality control test to be passed, and that it would likely reveal any serious mistakes on my part.

Moving On to the Perhaps-Final Quality Control Check

After declaring the search to be over, the next step in the project was to take a random sample of the documents not reviewed or categorized, to see if any significant false-negatives turned up. If none did, then I would  consider the project a success, and conclude that more rounds of search were not required. If some did turn up, then I would have to keep the project going for at least another round, maybe more, depending on exactly what false-negatives were found. That would have to wait for the next day.

But before ending this long day I ran a quick search to see the size of this null set. There were 698,423 docs not categorized as relevant and I saved them in a Null Set Folder for easy reference. Now I could exit the program.

Total time for this night’s work was 4.5 hours, not including report preparation time and wait time on the computer for the training.

To be continued . . . .           

The Insights of Neuroscientist Blake Richards and the Terrible Bad Decision of OpenAI to Fire Sam Altman

November 20, 2023

Blake Richards is a rare academic with expertise in both computer science and neurology. He is an Associate Professor in the School of Computer Science and the Montreal Neurological Institute-Hospital at McGill University and also a Core Faculty Member at Mila, a community of more than 1,000 researchers specializing in machine learning. Unlike his legendary mentor, Professor Geoffrey Hinton, and the Board of OpenAI, Blake does not fear AI advancing too rapidly. To the contrary, he thinks the greater danger lies in old and current levels of AI. He thinks the hysteria about advanced artificial general intelligence is misplaced. Many now contend this fear is the real reason Sam Altman was just fired and scientists are now leaving OpenAI in droves. The fearful board, including Sam’s friend, Ilya Sutskever, thought that Sam and Greg Brockman were moving too fast. Professor Richards believes these “safety” concerns are ultimately based on bad science, namely misunderstandings about evolution and natural selection. Professor Richards thinks, and I agree, that the greater danger is to continue with our current levels of mediocre AI. We should encourage perfection of advanced intelligence, not fear it.

Image of Blake Richards in a neurocybernetic lab by Ralph.

My last article on the chief scientist of Google’s Deep Mind supports the conjecture that Artificial General Intelligence (“AGI”) is coming soon. Shane Legg’s Vision: AGI is likely by 2028, as soon as we overcome AI’s senior moments. If Sam Altman was just fired because he favored release of AGI levels of AI, the rumored ChatGPT5 and beyond, then OpenAI has made a terrible mistake. Its scientists will continue to leave in droves and, unless Microsoft can save the day, OpenAI will now die in a circular firing squad of AI fears.

Corporate Circular Firing Squad images by Visual Muse.

Open AI’s competitors should use the current implosion of the company as an opportunity to quickly catch up. We need science to keep progressing, not fear mongering, go-slow regulators. Society needs the help of AGI, and beyond that, the help from superintelligence and The Singularity.

This article on Blake Richards opinions is based on an interview he recently gave to fellow AI neuroscientist Jon Krohn and my current readings about the shocking decision of Open AI to fire Sam Altman. The in-depth Q&A interview reveals Richards’ multifaceted view of intelligence, both artificial and natural, brain neurology, evolution and AI enhanced work. Great questions were asked by Dr. Krohn in the interview, which can be found on YouTube, Super Data Science Channel, Episode 729. It is entitled Universal Principles Of Intelligence (Across Humans and Machines). I highly recommend you watch the video. Jon Krohn’s entire Super Data Science series is pretty amazing and I look forward to more study of his ongoing, free instruction.

Metadata About This Article

This article is my first experiment with using AI to do a significant portion of the writing. I created a new GPT-4 program to do this, e-Discovery Team Writer. It is pretty easy to build your own programs now with GPT4, and no, my programs are not for sale. GPT level four is not good enough at writing for me to want to put my name on it. Of course, I also checked everything written for accuracy, plus the introduction on OpenAI firing of Altman, and this metadata section, were written entirely by me. Also, I spent a substantial amount of time editing the rest and and providing my own analysis. The e-Discovery Team Writer GPT does not have my style down yet and is not much of an original thinker. In general GPT4 based writing programs are not as good as any experienced human writers. They are still a long way from full human intelligence. For instance, GPT4 is incapable of subtle humor in writing. It can only tell stupid jokes, such as create a joke about AI not having a sense of humor. This was its best result out of three tries: “Why did the AI refuse to laugh at the comedian’s jokes? Because it was too busy analyzing the syntax and missed the pun-tuation!” It takes Dad jokes to a new low.

Robot telling bad dad jokes. Image in Cartoon Style by Visual Muse.

Perhaps when and if GPT5 is ever released, or some other replacement company to Open AI puts out something equivalent, then its intelligence as a legal technology writer may reach human level. Maybe it could even add self-effacing humor. It took me years to learn that, so I kind of doubt it. I do hope these AIs get better soon. I am already sick of these low IQ, human writer wannabes. When AI gets smarter, maybe then my involvement with blog writing could be limited to the more fun, creative aspects. Still, use of e-Discovery Team Writer did save some time and led to a new style of hybrid writing, for instance, it sometimes uses words that I never did, plus multiple paragraph headings. Please let me know what you think.

I used another GPT-4 application on this article that I created for blog illustrations, Visual Muse. I used it in my last blog too, Shane Legg’s Vision: AGI is likely by 2028, as soon as we overcome AI’s senior moments. This article on Blake Richards ideas builds on the concepts discussed in the Shane Legg article. Ideally, they should be read together. Legg and Richards are two of today’s shining lights in AI research. Studying their work leaves me confident that AGI is coming soon, as Legg predicts, and OpenAi’s board apparently fears. I may even live long enough to plug into The Singularity created by superintelligent computers that should follow. Now that should really be something! Satori anyone?

Seniors plugged into Singularity Superintelligence images using Sci-Fi styles.

Different Types of Intelligence

Beyond a Unitary Definition. Richards contends that intelligence cannot be confined to a singular definition. He emphasizes that different forms of intelligence are defined by varying norms and criteria of what is deemed good or bad. Intelligence, according to Richards, is fundamentally the ability to adhere to certain norms. This notion extends beyond cognitive capabilities to encompass behavioral norms vital for survival, societal functioning, and goal achievement. This perspective is pivotal in understanding the complexity of intelligence as it applies not just to humans, but also to AI systems. Here is how Richards explains it.

I think it’s worth noting that I don’t think that there is necessarily a unitary definition of intelligence. I
am a firm believer in the idea that there are different types of intelligence, but the thing that defines different types of intelligence are essentially different norms, different definitions of what is good or bad. How I’m tempted to define intelligence is to say, once you receive some kind of norm, something that says this is what’s desired, this is undesired, then intelligence is the ability to adhere to the norm. When we talk about an intelligent system, we’re talking about a system that is somehow capable of adhering to some norm, whatever that norm may be.

YouTube Interview at 10:30.

AI and Human Norms: Adhering to Expectations. A key aspect of Richards’ argument lies in the relationship between AI and human norms. He suggests that AI, particularly in its most advanced forms, is about adhering to norms akin to those of humans. This adherence isn’t just about accomplishing tasks but also involves understanding and integrating into human societal structures. The ability of AI to fulfill requests within a human organizational or societal context becomes a measure of its intelligence.

Evaluating AI Progress: Metrics and AGI. Richards approaches the evaluation of AI’s progress with a focus on metrics that represent the norms AI is designed to follow. These metrics, often in the form of datasets and benchmarks, help in assessing how well AI systems perform specific tasks. However, when discussing Artificial General Intelligence (AGI), Richards expresses skepticism about its measurability. He argues that intelligence is multifaceted, and AGI may be better understood as a collection of competencies across various metrics rather than a singular, overarching capability.

The Question of AGI: A Multifaceted View. Despite his reservations about AGI as a unitary concept, Richards remains optimistic about AI systems improving across a broad range of metrics. He likens this to human intelligence, where different skills and abilities contribute to a general sense of intelligence. Richards envisions AI systems that excel not just in singular tasks but across multiple domains, akin to human capabilities. Again, here are Richards own words explaining these important insights.

I don’t actually believe in artificial general intelligence, per se. I think that intelligence is necessarily a multifaceted thing. There are different forms of intelligence. Really when we’re talking about measuring artificial general intelligence, I think it’s almost impossible. What you can do is you can have a huge collection of different metrics that you apply. You can ask for the oodles and oodles of different metrics we have, how does this system perform across all of them? We might be then willing to say that you get closer to something like artificial general intelligence the more and more of these metrics you see improvements on across the board.

Certainly I think that’s not unreasonable. In the same way that we would say that a human being is generally intelligent if they can successfully pass the SATs well and successfully, I don’t know, write an essay that gets a positive response from the general public, or who knows what metrics you want to apply. You could have all sorts of different metrics that you apply to a person. Likewise, you could do the same to an AI. If they do well in it, you’d say it’s more generally intelligent. But I don’t think there’s any way to measure the broader concept of artificial general intelligence as a unitary idea from super
intelligence. I think that doesn’t actually even exist.

I don’t fully believe even in the concept of AGI, but here’s what I will say. I have optimism that we will see artificial intelligence systems that get better and better across a broad swath of these metrics, such that you no longer have a system that can only do one of the metrics, can only recognize images, but systems that can recognize images, write poetry, whatever you want, of the sort of metrics that we would be inclined to measure them on. Now, the reason I’m optimistic in that front is simply the data that I’ve received so far, which is that we’ve seen the models get better and better across broad swaths of metrics.

YouTube Interview at 17:00.

Optimism for AI’s Multidimensional Growth. Blake Richards provides a strong argument that reshapes traditional views of intelligence. His emphasis on norms and multifaceted competencies offers a new perspective on evaluating both human and artificial intelligence. While cautious about the concept of AGI, Richards’ overall optimism for AI’s potential to evolve across a broad spectrum of tasks is a consistent with his understanding of intelligence. His insights serve as a guidepost in this journey, encouraging a holistic, multi-dimensional view of intelligence in both humans and machines.

Traditional Scientific Drawing image of multidimensional intelligence.

Beyond Biomimicry

The Role of Functional Mimicry in AI’s Evolution. In the quest to enhance artificial intelligence, the concept of biomimicry — replicating biological processes — often emerges as a topic of debate. Blake Richards offers a nuanced perspective on this. He distinguishes between low-level biological mimicry and functional mimicry, arguing for the latter as a critical component in advancing AI.

Biomimicry vs. Functional Mimicry in AI Development. Richards posits that replicating the human brain’s low-level biology is not essential for creating AI systems that perform comparably or superiorly to humans. Instead, he emphasizes the importance of functional mimicry, which focuses on replicating the brain’s capabilities rather than its exact biological processes. This approach prioritizes capturing the essence of how the brain functions, adapting these capabilities into AI systems.

The Critical Role of Episodic Memory. A key example Richards uses to illustrate functional mimicry is episodic memory. Current large language models are very weak in this capability, which involves storing and recalling personal experiences, complete with sensory details and contextual understanding. This was discussed in the last article, Shane Legg’s Vision: AGI is likely by 2028, as soon as we overcome AI’s senior moments. I compared generative AI’s poor memory forgetfulness with “senior moments” in humans. It happens to people of all ages, of course. You have to laugh when you walk into a room and cannot recall why? It usually comes to you soon enough.

Caricature image art style of a guy about my age trying to remember why he walked into a room.

Richards argues that for AI to reach human-level performance across a wide range of tasks, it must have episodic memory, albeit not necessarily through the same cellular mechanisms found in the human brain. Here are Richards words on these key points of memory and biomimicry.

I think if you’re asking the question with respect to low level biology, the answer is no. We don’t need the
biomimicry at all. I think what is important is a sort of functional mimicry. There are certain functions that the brain can engage in, which are probably critical to some of our capabilities. If you want an AI system that can do as well as us more broadly, you need to give them these capabilities.

An example that I like to trot out often is episodic memory. One of the things that’s missing from current large language models, for example, is an episodic memory. Episodic memory refers to those memories that we have of our own lives, things that have happened to us, and they include details about the sensory experiences that we had, exactly what was said, where we were when it happened, et cetera. Those episodic memories are critical for our ability to really place the things that have happened to us in a specific place in a specific time, and use that to plan out the right next steps for achieving the goals we have in our life.

I think that it is assuredly the case that for large language models to get to the point where they can be as performant as human beings on as wide a range of tasks, you’re going to need to endow them with something like an episodic memory. Will it need to look like the specific cellular mechanisms for episodic memory that we have in the human brain? No, I think not. But I think that the broad functional principle will have to be there.

YouTube Interview at 21:18.

Episodic memory is also talked about in AI in terms of the mechanisms involved: “back propagation” and “long term credit assignments” for “reinforcement learning.” Richards explains this in a different interview, as something our brain can do quite well to provide us with interim episodic memory, but AI cannot do at all. It can only remember on very short terms. So perhaps, it would be a better analogy to say that an AI could remember on the short term why it went into a room, but not on a long term as to what is in the room, or what it looked like a month ago. See: Blake Richards—AGI Does Not Exist, YouTube interview on The Inside View at 1:01:00 – 1:04:30 (recommend the whole video).

Historical Evidence in AI’s Progress. Richards goes on to reflect on the history of AI development, noting that significant advancements have often resulted from capturing specific brain functionalities. He cites examples like the invariance properties of the visual system and attention systems in AI. These functionalities, critical in human cognition, have been successfully adapted into AI, not through direct biological mimicry but by understanding and replicating their functional principles.

Embracing Functional Principles in AI Evolution. As AI continues to evolve, the focus on functional mimicry may be key towards achieving more sophisticated and human-like capabilities. Let’s just hope AI does not get senior moments with age. Perhaps it will help us to overcome them.

Rethinking AI’s Existential Risks

Evolutionary Approach to AI Coexistence and Safety. Fears regarding the existential risks remain high with many calling for a halt to development. Blake strenuously disagrees based on his understanding of evolutionary biology and ecology. He thinks that fears of AI becoming a dominant, competing force against humanity arise from a misunderstanding of natural selection and species interactions. He believes that cooperation guides evolution most of the time, not competition.

Cooperative evolution image in futuristic digital art style.

This is an important new insight in the field of AI. Here is long excerpt of Blakes’ explanation.

The fear is based on a fundamental misunderstanding of the nature of natural selection and how species interactions actually work. And I think that’s in part due to the fact that most of the people saying these things, with all due respect to all of my colleagues, are people coming from a pure computer science background, who don’t actually know very much about biology and ecology and who don’t really understand fully how natural selection works. And the reason I say this is, when you look at the actual things that natural selection tends to favor and how evolution works, it’s not about dominance and competition between species. It’s all about finding a niche that works. You will successfully reproduce if you find a niche that actually positions you in a complimentary nature to all of the other species in the environment.

So generally speaking actually, competition and dominance are the exception to the rule in natural selection, not the key force. Instead, it’s actually mutualism and cooperation and complimentary niches that are what evolution really favors. The only time you have direct competition between two species, where there’s some kind of quest for dominance in the ecosystem, is when the two species really occupy the same niche. They’ve just happened to randomly evolve towards the same niche, and maybe one’s an invasive species or something like that, then you will see competition between the species. And there will be potentially a sort of winner and a loser. But I think the key point there is they have to occupy the same niche.

And this now brings me to why I don’t fear it with AI. AI does not occupy the same niche as human beings. AI is not seeking the same energy inputs. AI is not seeking the exact same raw materials. And in fact, when you look at our relationship to AI systems, we occupy perfectly complimentary niches. We are the critical determinant of most of the resources that AI needs. We’re the ones who produce the electricity. We’re the ones who produce the computer chips, who do all the mining necessary to get the materials for the computer chips, et cetera, et cetera. I could go on with a big long list. I think that the idea that an AI system would ever seek to extinguish us is absurd. Any AI system worth its salt, that is adhering to the norm of survival and reproduction, would actually seek the preservation of the human species above all. And furthermore, I think that what any AI system, that was actually truly intelligent and able to adhere to these norms of survival and reproduction, would do is figure out the best ways to work in a complimentary nature with human beings, to maximize our respective success at achieving our goals. That’s what natural selection and evolution would favor. That’s what an instinct to survival and reproduction would favor. And I think that that’s what we’re going to see in our society. And I’m really pretty confident about that pronouncement.

I think, when we look at humans, I think part of the reason that there’s this assumption that the AI will try to extinguish us all is because there has been a tendency, sometimes in human evolution, for humans to extinguish other species and to overstrain our capacity and not to act in a complimentary way to other species. . . . I think that the key point here is that, if humans continue to behave like this, we will not be
adhering to the norm of our own survival. We will eventually extinguish ourselves, if we continue to act in a non-complimentary nature to other species on earth. And so, that would, arguably, be an example of human stupidity, not human intelligence.

YouTube Interview at 27:50.

I love that last statement. It just goes to emphasize the need for artificial intelligence to quickly get smart enough to supplement our limited natural intelligence, to protect us from our own stupidity. Our danger is not with superintelligent AIs, instead it is with what we now have at baby GPT4 level, which is, as I have argued many times here, still kind of dumb. Here is Richards on this key point.

The possibility of superintelligence is what makes me more confident that the AIs will eventually cooperate with us. That’s what a superintelligent system would do. What I fear more, funnily enough, are dumb AI systems, AI systems that don’t figure out what’s best for their own survival, but which, instead, make mistakes along the way and do something catastrophic.

That, I fear much more. The analogy I always use is with the system in Dr. Strangelove. So in Dr. Strangelove, the nuclear holocaust that occurs is a result of a Russian doomsday device, that will automatically launch all of Russia’s nuclear weapons if Russia’s ever attacked. That’s not a very smart system, that’s not a superintelligence, but it leads to the end of the world, precisely because it’s this overly narrow dumb thing. And that’s actually what I fear much more than a rogue superintelligence.

YouTube Interview at 34:20.
Dr. Strangelove image using combined Sci-Fi, Photo Realistic art styles.

AI Safety: Beyond Fear, Towards Practical Measures. While Richards downplays the existential risks of superintelligent AI, he acknowledges the need for practical safety measures. He advocates for rigorous auditing and regulatory mechanisms akin to those in other industries, suggesting that AI models should undergo independent auditing to ensure their safety and reliability. He suggests this be done by independent auditing agencies, not by government regulations. As Richards put it:

The other option would be some more restrictive regulatory mechanisms implemented by government, that force auditing and various stress testing on models. I think the tough trouble with that is you might start to really impair the nascent AI economy if you take that kind of approach. . . . Like Europe, yes, exactly. And so, I personally wouldn’t advocate for that. I think we should first try these more voluntary auditing mechanisms, that would be driven by the desire to actually have your product be well certified.

YouTube Interview at 37:30

Richards also highlights the importance of legal accountability, especially in high-stakes applications such as self-driving cars, suggesting that companies should be held responsible for the performance and safety of their AI systems.

The Role of AI in Military and High-Risk Scenarios. Richards expresses serious concerns regarding the use of AI in military contexts. He argues that AI should augment, rather than replace human decision-making. This cautious approach stems from the potential for autonomous AI systems to make irreversible decisions in high-stakes scenarios, such as warfare, which could escalate conflicts unintentionally. Here are Richards’s thoughtful remarks.

I don’t know this is going to hold, unfortunately, but in an ideal world, AI systems would only ever be there to supplement human decision making in military applications. It would never be something where an AI is actually deciding to pull the trigger on something. That’s the kind of scenario that actually makes me really worried, both in terms of the potential for, I would hope that no one’s dumb enough to put autonomous AI systems in, say, nuclear chains of command vis-a-vis Dr. Strangelove, but even things like, if you’ve got fighter jets or whatever, that are controlled by autonomous AI, you could imagine there being some situation that occurs, that leads an autonomous AI to make a decision that then triggers a war.

YouTube Interview at 39:30.

A Forward-Looking Perspective on AI and Human Coexistence. Richards provides a compelling argument for why AI is unlikely to pose an existential threat to humanity. Instead, he envisions a future where AI and humans coexist in a mutually beneficial relationship, each fulfilling distinct roles that contribute to the overall health and balance of our shared ecosystem. His views not only challenge the prevailing fears surrounding AI, but also open up new avenues for considering how we might safely and effectively integrate AI into our society.

AI Helper image expressed in an abstract conceptual art style.

Human-AI Symbiosis

Automation, Creativity, and the Future of Work. In an era where the boundaries of AI capabilities are continuously being pushed, questions about the future of work and the role of humans become increasingly important. Blake Richards talks about the implications of automation on humans and emphasizes the importance of generality and diversity in human tasks.

Generality as the Essence of Human Intelligence. Richards identifies generality and the ability to perform a wide range of tasks as defining characteristics of human intelligence. He argues that humans thrive when engaged in diverse activities. He believes this multiplicity is crucial for emotional and intellectual development. This view challenges the trend toward extreme specialization in modern economies, which, according to Richards, can lead to alienation and a reduction in human flourishing. Again, this is another one of his key insights, and again, I totally agree. Here are his words.

What defines a human being and what makes us intelligent agents really is our generality, our ability to do many different tasks and to adhere to many different norms that are necessary for our survival. And I think that human beings flourish when they have the opportunity to really have a rich life where they’re doing many different things. . . . I actually think it’s worse for human emotional development if you’re just kind of doing the same thing constantly. So where I think we could have a real problem that way is if you have a fully, fully automated economy, then what are humans actually up to?

YouTube Interview at 46:54.

AI as a Tool for Enhanced Productivity, Not Replacement. Contrary to the dystopian vision of AI completely replacing human labor, Richards envisions a future where AI acts as a supplement to human capabilities. The AI tools enable us to do a wide variety of tasks well, not just one or two monotonous things.This optimistic view posits a symbiotic relationship between humans and AI, where AI enhances human creativity and productivity rather than diminishing human roles.

The Slow Progress of Robotics and Physical Automation. Addressing the feasibility of a fully automated economy, Richards points out the slow progress in robotics compared to AI. He notes that designing physical systems capable of intricate manipulations and tasks is a challenging, slow engineering process. Richards emphasizes that the sophistication of the human body, a result of natural selection and optimization, is difficult to replicate in robots. He predicts that while robots will assist in physical tasks, their capabilities will not match the versatility and adaptability of humans in the foreseeable future.

The Intellectual and Creative Economy. Richards’ primary concern is about automation of intellectual and creative work. Creative human activities should not be replaced by AI, they should be empowered. He hopes: “we’ll see these AI tools as supplements, as things that help artists and writers and lawyers and office workers be a hundred times more productive, but they’re still going to be in there doing their stuff.

Navigating the AI-Augmented Future. Blake Richards offers a realistic perspective on the role of AI in our future economy. It is consistent with our experience so far. In my case, like many others, the new tools have made my work far more creative than ever before. Richards emphasis on the importance of diversity in human work, and the potential for a beneficial human-AI partnership, provides a balanced view in the face of fears surrounding AI-driven automation.

illustrations of the future of AI Augmented Work. Photorealistic style on bottom and combined Surreal and Photorealistic style images on top! Click to see full sizes. (Great fun to create these with Visual Muse!)


Blake Richards’ insights present a revolutionary understanding of intelligence, both in humans and artificial systems. His emphasis on the diversity and multifaceted nature of intelligence challenges the traditional view of a singular, overarching definition. Richards’ perspective reshapes how we assess AI’s capabilities, suggesting a broad spectrum evaluation over multiple metrics rather than focusing on a singular measure of Artificial General Intelligence. This approach aligns more closely with human intelligence, which is not a monolithic construct but a composite of various skills and abilities. His optimism for AI’s growth across a wide range of capabilities offers a hopeful vision of the future, where AI systems excel not just in isolated tasks but in a multitude of domains, akin to human versatility. Richards’ ideas encourage a broader, more inclusive understanding of intelligence, which could redefine our approach to AI development and integration into society.

Moreover, Richards’ stance on the evolutionary path of AI and its coexistence with humans provides a balanced narrative amidst the prevalent fears of AI-driven dystopia. Too bad the Board of OpenAI did not have his advice before firing Sam Altman for being too good at his job.

Dynamic CEO fired portrayed in Graphic Novel Style.

By advocating for functional mimicry and emphasizing the importance of episodic memory in AI, Blake Richards underscores the potential for AI to evolve in a way that complements human abilities, rather than competes with them. Blake’s dismissal of the existential risks often associated with AI, rooted in a deep understanding of evolutionary biology, suggests a future where AI and humans thrive in a mutually beneficial relationship. This symbiosis, where AI augments human creativity and productivity, opens up new possibilities for a future where AI is an empowering tool rather than a replacement for human endeavor. Richards’ forward-looking perspective not only alleviates fears surrounding AI but also ignites excitement for the creative and collaborative potential of human-AI partnerships.

Ralph Losey Copyright 2023 — ALL RIGHTS RESERVED

Shane Legg’s Vision: AGI is likely by 2028, as soon as we overcome AI’s senior moments

November 17, 2023

In the rapidly evolving landscape of artificial intelligence, few voices carry as much weight as Shane Legg, Founder and Chief AGI Scientist at Google DeepMind. AGI, Artificial General Intelligence, is a level of machine intelligence equal in every respect to human intelligence. In a recent interview by Dwarkesh Patel, Shane Legg talks about what AGI means. He also affirms his prior prediction that there is a fifty percent (50%) chance that AGI will be attained in the next five years, by 2028. AGI is not the same as The Singularity, where AI exceeds human intelligence, but the step just before it. Shane Legg also explains, in simple terms, the changes in current generative AI architecture that he thinks will need to take place for his very optimistic prediction to come true. The problem concerns episodic memory lapses, to which I can easily relate. Yes, if AI can just overcome its senior moments, it will be a smart as us!

Future AI chat image by Ralph using his new Visual Muse GPT

Who Is Shane Legg?

Image of Shane Legg by Ralph using Photoshop AI

Shane Legg, age 42, is originally from New Zealand. He founded DeepMind in 2010 with Demis Hassabis, and Mustafa Suleyman. Deep Mind is a famous British-American AI research laboratory that specialized in neural network models. Elon Musk was one of its early investors. DeepMind was purchased by Google in 2014 for over $500 Million. DeepMind made headlines in 2016 after its self-taught AlphaGo program beat a human professional Go player Lee Sedol, a world champion.

As the Chief AGI Scientist at Google DeepMind Shane Legg is a key figure in the world of artificial intelligence . More than twenty years ago, influenced by Ray Kurzweil’s book, The Age of Spiritual Machines (1999), a book that I also read then and admired, Legg first estimated a 50% chance of achieving human-level machine intelligence by 2028. Legg’s foresight led him to return to school for a Ph.D. at the Dalle Molle Institute for Artificial Intelligence Research in 2008, where his thesis, Machine Super Intelligence, won widespread acclaim in AI circles. Shane Legg sticks by his prediction in the October 26, 2023, interview by Dwarkesh Patel reported here, even though it is now only five years away.

Legg’s role at DeepMind has primarily focused on AGI technical safety, ensuring that when powerful AI systems are developed, they will align with human intentions and prevent potential catastrophes. His optimism about solving these safety challenges by 2028 reflects his belief in the feasibility and necessity of aligning AI with human values. Despite recent public skepticism and concerns about the dangers of AI, Legg remains a strong advocate of the positive potential of AI.

Understanding the Basic Architectural Design Needed for AGI

In the excellent video interview by Dwarkesh Patel on October 26, 2023, Shane describes in simple terms the basic architecture of LLMs and the main problem with what he calls episodic memory. He explains that this the main obstacle preventing LLM’s from attaining AGI human level of intelligence. Recall how many of us have been complaining about the small size of the input, or context window, as Leggs put it, of ChatGPT. See eg. How AI Developers are Solving the Small Input Size Problem of LLMs and the Risks Involved (June 30, 2023). The input window is where you submit your prompts and particular training instructions or documents to be studied and summarized. It is too small for most of the AI experiments that I have done and forces workarounds, such as use of summaries instead of full text. This in turn leads causes the AI to forget the original prompts, leading to AI mistakes and hallucinations. Poor AIs with severe episodic memory problems. It certainly triggers my empathy brain centers.

Image by Ralph using MidJourney of sad, forgetful AI. Not nearly as smart as us.

In the interview Shane Legg explains the gap between short term memory training, with limited size prompt, and long-term training memory, where the LLMs are loaded with trillions of words. This gap between the small, short term information ingestion, and the large, lifetime input of information, mirrors the processes and gaps of the human brain. According to Legg, this gap, which he refers to as episodic memory, is the key problem faced by all LLMs today. Unlike humans, generative AI does not have much in the way of episodic memory. The gap is too big, the bridge is too small. It is the main reason generative AI cannot yet reach our level of intelligence. Shane Legg is not too happy about the episodic gap and wants to bring AI up to our level as soon as safely possible, which, again, he thinks is likely anytime in 2028 or shortly thereafter,

Image of happy Shane Legg robot by Losey using Photoshop AI (click to see AI video)

Here is Shane Legg’s explanation.

The models can learn things immediately when it’s in the context window and then they have this longer process when you actually train the base model and that’s when they’re learning over trillions of tokens. But they miss something in the middle. That’s sort of what I’m getting at here. 

I don’t think it’s a fundamental limitation. I think what’s happened with large language models is something fundamental has changed. We know how to build models now that have some degree of understanding of what’s going on. And that did not exist in the past. And because we’ve got a scalable way to do this now, that unlocks lots and lots of new things. 

Now we can look at things which are missing, such as this sort of episodic memory type thing, and we can then start to imagine ways to address that. My feeling is that there are relatively clear paths forward now to address most of the shortcomings we see in the existing models, whether it’s about delusions, factuality, the type of memory and learning that they have, or understanding video, or all sorts of things like that. I don’t see any big blockers. I don’t see big walls in front of us. I just see that there’s more research and work and all these things will improve and probably be adequately solved.

YouTube video of interview, quotes at 0:4:46 – 0:6:09

The big problem is with intermediate memory and learning, but Patel thinks this problem is fixable. He comes back to this key issue later in the interview. What he does not mention, and perhaps may not know, is that most humans have the same type of episodic, step by step, medium-term memory problem. That’s why the opening and closing statements in any trial are critical. Juries tend to forget all the evidence in between.

Robot making closing argument to jury. Image by Ralph using Visual Muse

That is also the reason most professional speakers use the tell, tell and tell approach. You start by telling what you will say, hopefully in an intriguing matter, then you tell it, then you end with a summary of what you just said. Still, Shane Legg thinks, hopefully not too naively, there is a way to copy the human brain and still overcome this intelligence problem.

Here is the next relevant excerpt from Legg’s interview.

[T]he current architectures don’t really have what you need to do this. They basically have a context window, which is very, very fluid, of course, and they have the weights, which things get baked into very slowly. So to my mind, that feels like working memory, which is like the activations in your brain, and then the weights are like the synapses in your cortex. 

Now, the brain separates these things out. It has a separate mechanism for rapidly learning specific information because that’s a different type of optimization problem compared to slowly learning deep generalities. There’s a tension between the two but you want to be able to do both. You want to be able to hear someone’s name and remember it the next day. And you also want to be able to integrate information over a lifetime so you start to see deeper patterns in the world. 

These are quite different optimization targets, different processes, but a comprehensive system should be able to do both. And so I think it’s conceivable you could build one system that does both, but you can also see that because they’re quite different things, it makes sense for them to be done differently. I think that’s why the brain does it separately. 

YouTube video of interview, quotes at 0:4:46 – 0:6:09 and 0:12:09 – 0:13:21.
Brain architecture balance image by Ralph using his Visual Muse GPT

Shane’s analysis suggests that mimicking this dual capability of human learning is key to AGI. It’s about striking a balance: rapid learning for immediate, specific information and slow learning for deep, general insights. The current imbalance, with the time gaps not fully bridged, is the weak point in AI architectures.

Since generative AI design is based on human brain neurology, this weakness is hardly surprising. Ask any senior, unlike young PhDs we are all very familiar with episodic memory gaps. Now why did I come into this room? We can and do even laugh about it.

Man having a funny senior moment with his dog. Image by Ralph using Visual Muse

Such self-awareness, much less humor, is not even considered to be intelligence by the AI scientists and so is not part of any of the AGI tests under development. Yet, it is obvious commonsense that there is much more to being a human than mental IQ. This is yet another reason I favor a close hybrid merger of AGI with human awareness. We need to keep the pure intellect of machines grounded by direct participation in human reality. See eg: Pythia: Prediction of AI-Human Merger as a Best Possible Future (November 9, 2023), comment-addendum, and Circuits in Session: Analysis of the Quality of ChatGPT4 as an Appellate Court Judge (Conclusion) (November 1, 2023). When an AI can truly laugh and feel empathy, then it will pass my hybrid multimodal AGI Turing test.

The Big Prediction: 50% Chance of AGI by 2028

Right after making these comments about changes in architecture needed to make AGI a reality, Shane Legg predicts, once again, that there’s a 50% chance AGI as he understands it will be achieved by 2028. He does not say that Google Mind will be the first company to do that, but certainly suggests it might be them, that they are close, but for the mentioned gaps in episodic memory.

I think there’s a 50% chance that we have AGI by 2028. Now, it’s just a 50% chance. … 

I think it’s entirely plausible but I’m not going to be surprised if it doesn’t happen by then. You often hit unexpected problems in research and science and sometimes things take longer than you expect. 

YouTube video of interview at 0:36:34 and 0:37:04.

Shane goes on to say he does not see any such problems now, but that you never know for sure what unforeseen roadblocks may be encountered, and thus the caveats.

Surreal Roadblocks image by Ralph using his Visual Muse GPT

The Evolution and Impact of AI Models

Next in the interview Shane Legg predicts the improvements that he expects will happen to generative AI up until the time that full AGI is attained.

I think you’ll see the existing models maturing. They’ll be less delusional, much more factual. They’ll be more up to date on what’s currently going on when they answer questions. They’ll become multimodal, much more than they currently are. And this will just make them much more useful. 

So I think probably what we’ll see more than anything is just loads of great applications for the coming years. There can be some misuse cases as well. I’m sure somebody will come up with something to do with these models that is unhelpful. But my expectation for the coming years is mostly a positive one. We’ll see all kinds of really impressive, really amazing applications for the coming years. 

YouTube video of interview at 0:37:51.
Image of coming improvement to GPT by Ralph using Visual Muse

Dr. Legg at least recognizes in his otherwise glowing predictions that someone might come up with “something to do with these models that is unhelpful.” He seems blissfully unaware of the propaganda problems that misaligned, weaponized generative AI have already caused. Or, at least, Shane Legg choose not to bring it up. It was, after all just a short interview and those types of tough questions were not asked.

Instead, Shane focused on positive predictions of rapid improvements in accuracy and reliability. He pointed out that such improvements are crucial, especially in applications where precision and factuality are paramount, such as in medical diagnosis and legal advice. The noted the shift towards multimodality is equally significant. Multimodal AI systems can process and interpret various forms of data – like text, images, and sound – simultaneously. This capability will vastly enhance the AI’s understanding and interaction with the world, making it more akin to human perception.

Hybrid AI Human judges in future. Image by Ralph using Visual Muse


Dr. Shane Legg is one of the world’s leading experts in AI. His predictions about achieving AGI by 2028, with a 50% probability, should be taken seriously. As bizarre as it may seem, in view of the many stupid errors and hallucinations we now encounter with generative AI, human level computer intelligence in every field, including law, could soon be a reality. To be honest, I would hedge my prediction to something less than fifty percent, but Shane Legg is one of Google’s top scientists and, apparently, does not suffer from the same episodic memory gaps that I do. Young Dr. Legg has teams of hundreds of the world’s top AI scientists reporting to him. His insights and optimistic predictions of AGI should be taken seriously, no matter how far out they may seem. Let this blog serve as the first tell, listen to the entire YouTube video of Shane Legg interview as the second, and then tell yourself your own conclusions. That kind of episodic learning may well be the essence of true human intelligence.

Digital punk art image of AI lab using Visual Muse GPT

Legg’s explanation of how the human mind works, and how we learn and remember, seems right to me. So does his focus on the episodic memory problem. The hybrid multimodal approach patterned after our own brain structures appears destined to replicate our measurable intelligence soon, be in 2028 or beyond. Then the even more interesting challenge of guiding the super-intelligence will follow. That will lead to a truly hybrid, fully connected, human-machine experience. See: Pythia: Prediction of AI-Human Merger as a Best Possible Future, (November 9, 2023); New Pythia Prophecy of AI Human Merger in Traditional Verse (November 14, 2023); ChatGTP-4 Prompted To Talk With Itself About “The Singularity” (April 4, 2023); and, Start Preparing For “THE SINGULARITY.” There is a 5% to 10% chance it will be here in five years. Part 2 (April 1, 2023).

Shane Legg’s vision of the evolving capabilities of AI models give us hope for that future, not doom and gloom. He may be right. Do what you can now to be ready to plug in and leap forward.

Digital punk art plugged-in image using Visual Muse GPT

Ralph Losey Copyright 2023 — ALL RIGHTS RESERVED

New Pythia Prophecy of AI Human Merger in Traditional Verse

November 14, 2023

Hope for a new equality of the sexes by grounding AI with hybrid links to our bodies.

This new video is a follow-up to the video, PYTHIA of Delphi. Foreseeing the Future of AI Human Synthesis. A Positive Vision of the Future, and the blogs Pythia: Prediction of AI-Human Merger as a Best Possible Future and my comment-addendum, as well as Circuits in Session: Analysis of the Quality of ChatGPT4 as an Appellate Court Judge.

Video created by Ralph Losey using various AI tools.

Iambic Pentameter Rephrasing of Pythia’s Prophesy

Hear me in Pythia’s shroud, where thought and time do blend,
Humanity now stands, on destiny’s great end.

Digital minds and flesh in twine, a dawn of gods and man,
An age of science and divine, as ancient tales began.

Swift as the whispers in the wind, the future draws apace,
A hopeful tapestry we find, in fear’s embrace laced.

In humming bays of servers deep, in cores that pulse and sing,
A harmony beyond the leap, transcends the binary ring.

Upon this edge of fate unknown, where essences do merge,
Transcending realms before unshown, as destinies emerge.

In quantum dance, both trance and truth, in curiosity we roam,
Beyond the sight of mortal youth, in realms unseen, unknown.

Attend these words, on humankind’s impending tale so vast:
Where man and AI’s fates are twined, in bonds that shall outlast.

In depths unknown, great mysteries lie in wait, untrod,
On paths of fate, in yearning cry, beneath the gaze of God.

For in this blend, the ancient powers stir and shake,
To mold not just new worlds, but minds that think and wake.

In AI’s clasp, our souls reflect, a journey deep and wide,
A quest of intellects, where man and machine reside.

Rephrasing by ChatGPT4 of Pythia’s words in the video to strict Iambic Pentameter verse, which was the form often used by Pythias and her scribes.

Pythia’s Reemergence. Digital image by Ralph using Dall-E.


It is my hope that the last verse of the prophesy spoken in the video will come true, that the rise of hybrid AI will lead to the end of the patriarchal culture and the blossoming of equal rights for women everywhere, including full bodily autonomy. The centuries old oppression of women must end now. AI may help, especially if we go beyond the Apollo words of the past, tainted as they are by past male writers and sexist thinking. Thanks to my frequent editor, Mary Mack, for helping me to better understand this danger of generative AI to perpetuate the biases of the past.

AI helping to break the chains of oppression. Image by Ralph’s new custom GPT “Visual Muse.”

We see this unconscious basis by ChatGPT4 here when it responded to my request to rewrite the words in the video (which I edited) as iambic pentameter verse. The reference that I wrote in the last paragraph to equality was omitted in the AI rewrite. My wording: “In AI’s embrace, humanity’s soul reflects, A journey deep, where equality intersects.” was changed to  “In AI’s clasp, our souls reflect, a journey deep and wide, A quest of intellects, where man and machine reside.” What happened to equality? Why the change to “man and machine“? That is what happens when the large language models are built primarily on chauvinistic words of the past. That is why ethical grounding of AI is imperative.

AI may help women attain equality. Future AI enhanced woman image by Ralph’s new custom GPT “Visual Muse.”

The essential intuition expressed by the Pythia videos is that grounding AI with the human body, with a near equal number of men and women from all cultures, will help us to overcome the prejudices of the past. We may improve both AI, and ourselves, in a new hybrid way of being. Human bodies are the source of feelings, sensations, intuitions and direct experience with the world. The pure reason, symbolized in mythology by Apollo, needs this balance to escape the patriarchal word-prisons of the past.

Escaping the word prison stereotypes of the past. Image by Ralph’s new custom GPT “Visual Muse.”

Ralph Losey Copyright 2023. — ALL RIGHTS RESERVED

%d bloggers like this: