NextQuestion https://admin.next-question.com 以科学追问为纽带,不断探索科学的边界。 A scientific media dedicated to exploring the exciting topics that are spurring the next big questions on the very frontiers of science. Wed, 01 Jan 2025 22:17:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://admin.next-question.com/wp-content/uploads/2023/11/cropped-NQ_logo_symbol_color-32x32.png NextQuestion https://admin.next-question.com 32 32 How Can We Determine If the Universe Is Not a Simulation? https://admin.next-question.com/features/universe-not-simulation/ https://admin.next-question.com/features/universe-not-simulation/#respond Wed, 01 Jan 2025 22:16:28 +0000 https://admin.next-question.com/?p=2692 On an ordinary evening, as you wrap up a day of work and sit down to start a game, the loading bar finishes, and a virtual world appears on the screen.

Immersed in the game, you suddenly notice an ant crawling slowly on your monitor. In that instant, an image from the novel The Three-Body Problem comes to mind. The ant feels like a subtle reminder of something profound, sending a chill down your spine. A startling thought strikes you like lightning—what if each of our lives, or even this vast world, is nothing more than a simulation crafted by an advanced civilization?

At first, the idea may seem absurd. But consider this: the foundation of our existence, much like many major discoveries in human history, has been overturned repeatedly. Five hundred years ago, geocentrism was an unshakable belief. Two hundred years ago, proposing that humans evolved from apes could have branded someone a heretic. Humanity’s understanding of nature is a journey of realizing that we are not as exceptional as we once believed. This concept is embodied in the Copernican Principle, which asserts that no observer should regard themselves as uniquely special.


▷ The diagram “M” (Latin for Mundus) from Johannes Kepler’s Epitome of Copernican Astronomy illustrates this point: Earth is just one of countless stars, a stark reminder that humans, whether on Earth or within the solar system, are not privileged observers of the universe.

1. Ancestor Simulation

If you are familiar with the history of video games, you might recall how gaming has evolved dramatically over just a few decades—from the simple, pixelated graphics of ping-pong games to the lifelike environments of modern massively multiplayer online games. Looking to the future, with the rapid advancement of virtual reality, the scenarios depicted in the sci-fi series Black Mirror—such as uploading and downloading consciousness—seem almost within reach.

In this context, the notion that “we are virtual beings living in a simulation” begins to feel less far-fetched. After all, on a cosmic timescale, a few decades are barely a blink of an eye. And as the Copernican Principle reminds us, it is statistically improbable that humanity represents the pinnacle of civilization in the universe.

So, how likely is it that we are living in a simulation? Once you begin to think seriously about this question, you’ll realize you are not alone. Many physicists and philosophers have explored this possibility, sharing their thoughts through publications and public debates. In 2016, the American Museum of Natural History in New York hosted a panel where four physicists and a philosopher spent two hours discussing whether reality as we know it could be a simulation. Their estimates of the probability ranged widely, from as low as 1% to as high as 42%. That same year, Elon Musk famously declared in an interview that “we are most likely living in a simulation.”

However, following the crowd has never been your style. Determined to explore this question independently, you start gathering evidence. Could we, in fact, be living in a simulation? And if so, is there a way to break free from it?

The type of “simulation” discussed here is not a simple game like The Sims. Instead, it is a comprehensive model of the observable universe—what scholars refer to as an Ancestor Simulation. [1] This hypothesis proposes that a civilization with sufficiently advanced technology might create a simulation on an immense scale. While we cannot predict the exact capabilities of such future technologies, we can attempt to estimate their upper limits based on the physical laws of our observable universe.

If running a simulation of this magnitude would require energy and computational power beyond the constraints of physics, we could argue that such a civilization could not exist. In that case, the Ancestor Simulation hypothesis would be disproven—or, at the very least, humanity’s existence within such a simulation would become highly improbable.


▷ Source: Claire Merchlinsky

2. The Energy Required for Simulation

High-fidelity simulations necessitate immense information processing, which, with current technological capabilities, translates into extraordinarily high computational demands. In such scenarios, the state of every fundamental particle would need to be recorded and updated in real time, requiring computational resources on an astronomical scale. Furthermore, the energy needed to power these highly complex simulations would entail staggering demands. According to thermodynamic principles, extracting energy from cosmic background radiation to sustain computations of this magnitude would not only require overcoming theoretical physical limitations but also addressing significant practical challenges in technology.

These arguments suggest that, within our current technological and theoretical framework, energy constraints make it improbable that we are living in a simulated universe. However, such constraints may not represent an insurmountable barrier. For instance, a super-advanced civilization might bypass energy limitations by creating less precise simulations. Quantum mechanics, with its uncertainty principle—which prevents precise knowledge of an electron’s exact position—and the non-local effects of quantum entanglement, could serve as efficient methods to conserve computational resources during simulation construction.

If we are indeed part of a simulation, and its creators used resource-saving measures, we might uncover evidence of this illusion. One potential approach is to continuously increase the resolution of our observations to detect whether we encounter discrete, pixelated structures. This is similar to video games, where increasing the resolution can expose hidden inconsistencies or graphical artifacts. Interestingly, quantum mechanics already reveals that the world appears discontinuous at extremely small scales, lending some plausibility to the simulation hypothesis.

Moreover, in video games, wandering to the edges of the map often leads to graphical glitches or invisible “walls” that limit further exploration. By analogy, searching for similar anomalies could help determine whether we exist within a simplified ancestor simulation. However, in reality, humanity’s farthest-reaching probe, Voyager 1, has traversed the edge of the solar system without encountering any discontinuities or anomalies. Likewise, the Hubble Space Telescope has captured images of galaxies over 10 billion light-years away without uncovering any apparent anomalies. While these observations cannot definitively rule out the possibility that we are living in a simulation, they do significantly reduce its likelihood.


▷ Source: Staudinger + Franke

3. A Simulated Universe Does Not Require Full Brain Simulation

Beyond the vastness of the starry sky, the human mind is equally complex. The number of neural connections in the human brain is estimated to rival the number of stars in the observable universe. A 2024 study published in Science [2] conducted a nanoscale simulation of just 1 cubic millimeter of brain tissue, encompassing 57,000 cells and 150 million synapses. The resulting mapping data totaled 1.4 petabytes (Pb). When extrapolated to the full volume of the human brain, creating a static map of the entire brain would require 1.76 zettabytes (1 Pb = 1,024 terabytes [Tb], 1 Zb = 1,024 Pb). This vastly exceeds the parameter count of current open-source models, such as Llama 3.1, which had 405 gigabytes (GB) of parameters as of October 2024.

To create an ancestor simulation, it would be necessary to simulate not only every star in the sky but also the mental activities of every human who has ever lived. Attempting to replicate all neural activity from ancient times to the present on a supercomputer would demand energy that exceeds the physical limits of our universe, challenging the notion that we are living in a simulation [3].

However, one might argue that simulating mental activities does not always require extensive computation. A super-advanced civilization might only simulate minimal portions of the brain for most people, reserving significant computational resources for individuals engaged in deep thought. This raises an intriguing testable hypothesis: if billions of people simultaneously engage in reflective thinking, requiring full brain simulations, the computational load might overwhelm the simulation, causing delays reminiscent of frame rate drops in video games. Such anomalies could provide evidence supporting the simulation hypothesis.

Additionally, there are two counterarguments to the assertion that simulating human minds consumes excessive energy:

(1) On-Demand Rendering: In constructing an ancestor simulation, a super-advanced civilization might only render areas being actively observed. This is analogous to the “fog of war” in video games, where unobserved areas remain hidden, significantly reducing computational demands.

(2) Selective Simulation: The creators might simulate only the neural activities of “you,” ensuring your subjective experience feels authentic while generating others’ responses through simplified models. Current large-scale AI models can already simulate human-like interactions; for a civilization capable of ancestor simulations, this task would be trivial, drastically reducing energy consumption.


▷ Source: Mart Biemans

Beyond energy considerations, another way to evaluate the simulation hypothesis is to detect inevitable computational errors. If we live in a simulation, modeling inaccuracies would eventually accumulate, creating observable inconsistencies—similar to how The Matrix required regular reboots. In video games, prolonged gameplay often reveals modeling errors that necessitate developer patches to maintain consistency. By analogy, if we live in a simulation, changes in fundamental constants, such as the speed of light, could serve as evidence. However, the brevity of human history and our understanding of physics limits our ability to detect such long-term changes. It is possible that such patches exist but operate on timescales of tens of thousands or even millions of years. Therefore, even though we have not observed changes in physical constants, we cannot completely rule out the possibility that we are living in a simulation.

An additional counterargument is that a super-advanced civilization might prohibit ancestor simulations for ethical reasons. While less robust, this argument reduces the likelihood of such simulations existing. Similar to how the Drake Equation explores the probability of extraterrestrial civilizations’ existence—calculated by multiplying the number of stars in the universe by the probability of life arising on them and the probability that intelligent life is willing to communicate—thereby estimating the number of extraterrestrial civilizations capable of establishing communication.


▷ Source: University of Rochester
The Drake Equation, originally used to estimate the number of extraterrestrial civilizations, now includes:
N: The number of civilizations capable of creating simulations. R*: The rate of star formation in the Milky Way. fp: The fraction of stars with planetary systems. ne: The average number of habitable planets per system. fe: The fraction of habitable planets where life arises. fi: The fraction of life-bearing planets where intelligent life evolves. fc: The fraction of intelligent life willing to communicate or simulate. L: The time such civilizations remain active.

By adding two new factors to the Drake Equation—first, the probability that intelligent life advances to the point of being able to simulate a world; second, the probability that a super-advanced civilization would prohibit ancestor simulations for ethical reasons—we can use the modified equation, based on current technological and cultural development, to estimate the probabilities of these new factors. This allows us to assess the number of planets that might be willing to simulate worlds and, consequently, infer the probability that we are living in a simulation.

4. Three Possibilities for a Simulated World

In his 2003 paper Are You Living in a Computer Simulation?, philosopher Nick Bostrom redefined the debate by proposing three possibilities that cover all potential scenarios:

First, humanity or other intelligent beings go extinct before achieving simulation technology.
Second, super-advanced civilizations capable of creating ancestor simulations either lack the will or face prohibitions against using such technology.
Third, We are currently living in an ancestor simulation.

To evaluate which of the first two possibilities is more likely, two critical questions arise.The first question examines the probability of life successfully reaching the singularity. For example, before humanity develops general artificial intelligence, could it face extinction due to catastrophic events like runaway climate change or nuclear war? On this issue, the scientific community provides some predictions, though the outcomes remain uncertain.

The second question delves into the societal structures and moral frameworks of super-advanced civilizations—a nearly impossible task. Just as a gorilla cannot grasp the utility humans derive from advanced technology, we cannot envision the motivations or priorities of beings capable of ancestor simulations. Perhaps such civilizations lack the drive to undertake the creation of simulations that, once initiated, require no further involvement. Alternatively, ethical considerations or cultural values might prohibit such endeavors altogether. However, our complete lack of insight into their reasoning makes assessing this possibility extremely challenging.

At its core, if we ourselves do not believe we are living in a simulation, there is little reason to assume that future post-human civilizations would conduct large-scale ancestor simulations. Conversely, if we think humanity could one day simulate an entire world, why not accept the likelihood that we are already part of such a simulation?

One way to approach this paradox is through Occam’s Razor, which suggests that the simplest explanation is often more likely than a complex one. Believing we live in a simulation assumes the existence of a super-advanced civilization, thereby adding unnecessary complexity to the model. The maxim “entities should not be multiplied without necessity” thus provides a compelling framework for addressing the simulation hypothesis. However, Occam’s Razor is not absolute; in some cases, scientific consistency, predictive power, or internal logic may require adopting a more complex model.


▷ Source: Claudio Araos Marincovic

5. Why Is Speculative Thinking Meaningful?

Readers who have reached this point might wonder: “Like Zhuangzi dreaming of a butterfly or the butterfly dreaming of Zhuangzi, why should we concern ourselves with such abstract questions when life is already fraught with its own challenges?” This line of inquiry may seem akin to the philosophical stance of agnosticism—ultimately irresolvable. However, our concerns and curiosities drive exploration and foster awe for the universe’s diversity and mysteries. Throughout the history of science, many revolutionary discoveries have emerged from profound questions about the essence of existence.

When we explore the possibility of a simulated universe, we are not only questioning the nature of reality but also sparking deeper discussions about technology, ethics, and existence itself. For example, this inquiry intersects with advancing technologies such as virtual reality (VR) and mixed reality (MR), raising questions about how we treat virtual beings powered by AI-driven models. As we approach the advent of general artificial intelligence (AGI), such discussions also highlight concerns about losing control over intelligent agents and whether these agents might attempt to escape the simulated environments we create for them. Speculating on whether we are living in a simulation inspires research into AI alignment while also deepening our understanding of existence.

In the future, as technology advances and humanity’s understanding of the universe grows, we may approach these questions with entirely new perspectives. More detailed observations, refined data analyses, and novel technological methods could help us test these hypotheses. Yet, in the pursuit of truth, maintaining curiosity and humility toward the unknown remains among the most enduring values in scientific inquiry.

]]> https://admin.next-question.com/features/universe-not-simulation/feed/ 0 Hearing the Silent Soul: Consciousness in Vegetative State Patients https://admin.next-question.com/features/hearing-the-silent-soul-consciousness-in-vegetative-state-patients/ https://admin.next-question.com/features/hearing-the-silent-soul-consciousness-in-vegetative-state-patients/#respond Wed, 01 Jan 2025 21:33:33 +0000 https://admin.next-question.com/?p=2683 Introduction: Using advanced neuroimaging technologies such as fMRI and EEG, scientists have identified subtle signs of consciousness in patients previously diagnosed with a “vegetative state” (VS). This breakthrough not only challenges the traditional understanding of the vegetative state but also highlights the concept of “covert consciousness” or “cognitive-motor dissociation.”

In Edward Yang’s film Yi Yi, an elderly woman falls into a coma following an accidental fall. Her caregiver advises the family to talk to her daily to aid recovery. But does speaking to a comatose individual truly make a difference? Can such patients actually hear, or even retain any degree of consciousness?

1. Conscious Responses in the “Vegetative State”

The question of whether comatose individuals retain consciousness has sparked debate since the 1990s.

Modern medicine traditionally defines the vegetative state as a condition characterized by a lack of conscious awareness and unresponsiveness. Yet, research indicates that 13–19% of patients diagnosed with VS or coma exhibit conscious responses to external stimuli.


▷ Owen, Adrian M., et al. “Detecting awareness in the vegetative state.” science 313.5792 (2006): 1402-1402.

A groundbreaking 2006 study published in Science (Owen et al., 2006) reported a compelling case [1]: a 23-year-old female patient in a vegetative state following a traffic accident demonstrated neural activity indistinguishable from that of healthy individuals when exposed to linguistic stimuli. Strikingly, when presented with sentences containing ambiguity, her left ventrolateral prefrontal cortex displayed neural responses typical of language comprehension.

Even more remarkably, the patient exhibited evidence of “imaginative behavior.” When instructed to imagine specific actions such as playing tennis or walking (motor imagery), her neural activity was indistinguishable from that of healthy controls.


▷ The study monitored the activity of the supplementary motor area (SMA) in both the patient and a control group of 12 healthy volunteers during the process of imagining playing tennis. When imagining moving around her home, the researchers detected neural activity in the parahippocampal gyrus (PPA), posterior parietal cortex (PPC), and premotor cortex (PMC).

Despite meeting the clinical criteria for a vegetative state, her ability to comprehend and follow instructions suggested a preserved awareness of herself and her environment. This study underscored that neural activity in response to stimuli can serve as a marker of consciousness, while its absence does not necessarily confirm unconsciousness.

Previously, it was widely believed that comatose patients either progressed to brain death or transitioned into a vegetative state—both thought to lack consciousness entirely. However, Adrian Owen’s findings introduced the possibility of intermediary states, such as the “minimally conscious state” (MCS) or “covert consciousness.”

This raises a profound question: how can we accurately determine whether a person, especially one unable to communicate verbally or behaviorally, retains consciousness?

2. Determining the Presence of Consciousness

To assess the presence of consciousness in individuals with severe neurological conditions, it is crucial to first understand the classifications of death. Death is generally divided into two main categories: legal death (the cessation of legal personhood) and natural death, which itself encompasses three progressive stages: clinical death, brain death, and biological death [2].


The vegetative state (VS) is a condition that resembles these stages but retains certain physiological functions. According to the Merck Manual of Diagnosis and Therapy [3]:

The vegetative state is a chronic condition in which patients can sustain blood pressure, respiration, and cardiac functions but lack higher cognitive abilities. While the hypothalamus and medullary brainstem functions remain intact to support cardiopulmonary and autonomic systems, the cortex suffers severe damage, eliminating cognitive function. Despite this, the reticular activating system (RAS) remains functional, allowing wakefulness. Midbrain or pontine reflexes may or may not be present. Typically, such patients lack self-awareness and interact with their environment only via reflexive responses.

However, findings from Owen’s team demonstrate that patients diagnosed as being in a vegetative state may indeed possess self-awareness and can even interact with their environment autonomously through neural activity. For example, patients have been shown to imagine actions, such as playing tennis, in response to instructions from clinicians. This neural activity, which can be detected using fMRI or EEG, indicates the presence of consciousness despite the absence of observable physical behavior. This phenomenon is referred to as “covert consciousness.”

3. Cognitive-Motor Dissociation: A New Understanding of Consciousness

Building on Adrian Owen’s groundbreaking research, subsequent studies have confirmed the phenomenon of “covert consciousness” in patients previously classified as being in a vegetative state. These findings, often based on motor imagery techniques, have identified a condition termed Cognitive-Motor Dissociation (CMD).

CMD describes a state in which patients demonstrate measurable neural responses to external stimuli, including motor imagery comparable to that of healthy individuals. However, due to motor system impairments, they are unable to execute corresponding physical actions.


▷ Schiff, Nicholas D. “Cognitive motor dissociation following severe brain injuries.” JAMA neurology 72.12 (2015): 1413-1415.

In 2015, Nicholas D. Schiff’s team used advanced imaging techniques such as Dynamic Causal Modeling (DCM) and Diffusion Tensor Imaging (DTI) to assess the functional integrity of brain networks supporting motor imagery in these patients. Their research aimed to uncover the neural mechanisms underlying Cognitive-Motor Dissociation [4].

The study revealed that in healthy individuals, excitatory coupling between the thalamus and motor cortex is critical for executing motor behaviors. However, in patients with covert consciousness, analyses using DCM and DTI demonstrated a selective disruption of this coupling. This disruption explains the dissociation between preserved motor imagery and absent skeletal movement. It is akin to operating the mechanical arm of an excavator when the circuit between the control panel and the arm is severed—you can still issue commands, but the arm cannot carry out the movements.

*DCM: A neuroimaging technique that examines causal relationships and connectivity between brain regions, providing insights into their functional interactions.
DTI: An MRI-based method for mapping the direction and integrity of white matter tracts, offering a detailed view of brain structural connectivity and its changes.

4. Research on the Prevalence of Cognitive-Motor Dissociation

Patients with Cognitive-Motor Dissociation (CMD) are often misdiagnosed as being in a vegetative state. However, there is currently no standardized method for accurately identifying CMD. The mechanism identified by Schiff’s team, discussed earlier, may not apply universally to all CMD cases. Moreover, previous studies on CMD have generally been small-scale, fragmented, and lacking consistency, with limited sample sizes. Under these conditions, developing a globally applicable standard for patient assessment is challenging, let alone designing targeted treatment approaches for CMD.

To address this issue, the paper titled “Cognitive Motor Dissociation in Disorders of Consciousness”, [5] supported by the Tianqiao Chen Institute-MGH Research Scholar Award, offers a potential solution. This study, published in The New England Journal of Medicine in August 2024, features Chen Scholar Brian L. Edlow as a key participant.


▷ Bodien, Yelena G., et al. “Cognitive motor dissociation in disorders of consciousness.” New England Journal of Medicine 391.7 (2024): 598-608.

This study recruited patients from diverse settings, including intensive care units, rehabilitation centers, nursing homes, and community populations. A unified inclusion criterion was applied: participants had no prior history of neurological or psychiatric disorders and were physiologically capable of undergoing brain imaging techniques, including MRI and EEG.

The research analyzed 353 adult patients diagnosed with disorders of consciousness using motor imagery tasks assessed via fMRI and EEG. Among these, 241 patients were classified as being in a coma, a vegetative state, or a minimally conscious state (MCS) based on the internationally recognized Coma Recovery Scale-Revised (CRS-R). Notably, 25% (60 patients) of this group exhibited CMD, characterized by task-based neural responses despite the absence of behavioral responses. Additionally, 112 patients were diagnosed as being in an MCS+ state or as having recovered from MCS. Among this subgroup, 43 patients (38%) exhibited observable task-based responses to commands detected via fMRI, EEG, or both.

*Minimally conscious states are further categorized as MCS− and MCS+, based on language processing abilities: MCS+ patients demonstrate simple command-following, intelligible verbal expression, or conscious but non-functional communication. MCS− patients exhibit more limited behaviors, such as visual pursuit, localization to noxious stimuli, or basic spontaneous activities like grasping bedsheets.

This study reported a higher prevalence of CMD (25%) than previous research, which estimated that 10–20% of patients with disorders of consciousness exhibit CMD symptoms. While traditional behavioral metrics like CRS-R offer valuable insights into assessing patient consciousness and responsiveness, task-based neuroimaging technologies significantly enhance diagnostic accuracy. Notably, the combined use of EEG and fMRI further improves measurement precision. These findings suggest that the prevalence of CMD may have been underestimated in prior studies.

5. The Significance of Measuring Consciousness in the Vegetative State

Imagine being aware of everything around you—the hum of medical equipment by your bedside, the whispers of family and friends—yet being unable to speak, move, or respond in any way. This state of isolation and helplessness could lead one to abandon even the faintest will to live. Although the phenomenon of “living dead” occurs in only a minority of vegetative state patients, every individual, no matter how rare their condition, represents a unique and precious life.

Whether driven by technological advancement or humanistic ethics, we need a measurement method that is more accessible, precise, and cost-effective to determine the consciousness state of comatose patients. The research conducted by Y.G. Bodien and colleagues stands out by integrating patient data from multiple locations and unifying measurement methodologies through brain imaging techniques. This unprecedented approach has maximized the accuracy of identifying Cognitive-Motor Dissociation (CMD) patients and pointed the way forward for future medical technologies and treatments. Most importantly, it offers a renewed sense of hope to those trapped in a silent world.

]]> https://admin.next-question.com/features/hearing-the-silent-soul-consciousness-in-vegetative-state-patients/feed/ 0 Why Can’t Musk’s “Blind Vision” Surpass Natural Sight? https://admin.next-question.com/uncategorized/blind-sight-surpass/ https://admin.next-question.com/uncategorized/blind-sight-surpass/#respond Fri, 27 Dec 2024 05:03:34 +0000 https://admin.next-question.com/?p=2668 Perhaps I can best illustrate by imagining what I should most like to see if I were given the use of my eyes, say, for just three days. And while I am imagining, suppose you, too, set your mind to work on the problem of how you would use your own eyes if you had only three more days to see. If with the oncoming darkness of the third night you knew that the sun would never rise for you again, how would you spend those three precious intervening days? What would you most want to let your gaze rest upon?
—— Helen Keller

In our vibrant world, we often lament Helen Keller’s blindness and are deeply moved by her longing for the beauty she could never see. However, the reality is that over 2.2 billion people suffer from visual impairment or blindness worldwide. Advances in visual cortex restoration technologies, particularly visual cortical prosthetics, offer a glimmer of hope for these individuals. Yet, even if partial vision is restored, can we truly understand what their world looks like?

Currently, three clinical trials of visual cortical prosthetics are underway. One uses surface electrodes (the Orion device by Second Sight Medical Products 1,2), while the other two employ deep electrodes 3,4. However, predicting the extent of restored vision remains a challenge. Furthermore, the field of neural implants heavily depends on intuition and experience, which introduces the risk of perceptual biases due to subjective interpretations.

A recent article in Scientific Reports 5 by Ione Fine and Geoffrey M. Boynton from the University of Washington introduces a “virtual patient” model grounded in the architecture of the primary visual cortex (V1). This model accurately predicts perceptual experiences reported in various studies on human cortical stimulation. Simulations reveal that increasing the number of smaller electrodes does not always improve visual restoration. In the near term, the quality of perception provided by cortical prosthetics seems more constrained by the visual cortex’s neurophysiological structure than by technological limitations.
 

1. Advances and Challenges in Vision Restoration Technologies

Vision restoration technologies are advancing rapidly worldwide, with at least eight teams developing retinal electronic implants. Among these, two devices have been approved for patient use 6-12, while others are currently in clinical trials 13. Another promising approach is optogenetics, which has shown limited but encouraging results in early clinical trials. Gene therapy, particularly for conditions like Leber congenital amaurosis, has already received clinical approval, and many additional gene therapies are under active development. Furthermore, retinal pigment epithelium and stem cell transplantation are progressing, with several phase I/II trials ongoing. Many other promising therapies are also being explored.

However, these approaches are limited to retinal interventions and cannot address conditions like retinal detachment or diseases causing irreversible damage to retinal ganglion cells or the optic nerve, such as congenital childhood glaucoma. This limitation has driven significant interest in vision restoration by directly targeting the brain’s cortical visual centers. Since 2017, three clinical trials for visual cortical prosthetics have been initiated. These trials, grounded in decades of research on cortical stimulation (see Table 1), have examined both short-term and long-term effects. However, the findings so far are primarily descriptive.


▷ Table 1: Studies describing perceptual effects of human cortical electrical stimulation.

Predicting the level of vision restoration achievable with visual cortical prosthetics before human implantation remains a significant challenge. The field of neural implants continues to rely heavily on experience and assumptions. For example, it seems logical that increasing the number of smaller electrodes would enhance resolution. But is this assumption valid?

A team led by Ione Fine 14 developed a computational “virtual patient” model based on the neurophysiological structure of the primary visual cortex (V1). This model accurately predicted perceptual outcomes from electrical stimulation, including the position, size, brightness, and spatiotemporal shape of elicited visual perceptions. Such predictions provide a valuable tool for anticipating the perceptual effects of visual cortical prosthetic implants before their application in humans.
 

2. Addressing Data Shortage with a “Virtual Patient” Model

Building on previous models of retinal prosthetic stimulation, Ione Fine’s team has developed an innovative theoretical framework. The model integrates currents from cortical implants over time, converting them into neural signal intensities. The resulting perception from neuronal stimulation is based on a linear summation of each cell’s receptive field, adjusted by the neural signal intensity at each specific location and moment. Despite its mathematical simplicity and lack of complex parameters, the model accurately predicts a range of data on cortical stimulation.


▷ Fine, Ione, and Geoffrey M. Boynton. “A virtual patient simulation modeling the neural and perceptual effects of human visual cortical stimulation, from pulse trains to percepts.” Scientific Reports 14.1 (2024): 17400.

(1) Temporal Conversion from Pulse Trains to Perceptual Intensity

To design the model, Ione Fine’s team first employed a rapid temporal integration phase, capturing the immediate response of cells to current as a measure of “spike activity intensity.” They also introduced a refractory period, positing that cells require a brief interval after activation before they can respond to subsequent stimuli. This phase is followed by a slower integration process, incorporating a compressive non-linear relationship, as shown in Figure 1.

▷ Figure 1. Schematic of the conversion from pulse trains to perceptual intensity over time

The team implemented this process using a straightforward single-stage leaky integrator. In this model, the depolarization rate depends on both the current depolarization level and the input current, meaning that variations in current directly influence the rate of depolarization. In the model’s first phase, the “spike response intensity” reflects the collective recruitment of spikes from various cell populations with differing activation thresholds. The compressive non-linear function not only accounts for saturation effects in these populations but also mirrors complex cortical gain control mechanisms.

(2) Visual Dominant Columns, Orientation Pinwheels, and Receptive Fields

The primary visual cortex (V1) contains complex, structured neural arrangements that influence how visual information is perceived and processed. Building on the work of Rojer and Schwartz, Figure 2 illustrates these simulated structures. The orientation columns (Figure 2B), sensitive to visual direction, were generated by band-pass filtering random white noise to reflect neurons’ directional preferences. Fine’s team further enhanced the model by incorporating visual dominant columns (Figure 2C), which add directional gradients to the same noise signal, resulting in orthogonally arranged visual and orientation columns. These structures closely mimic maps observed in both macaques and humans.

A single receptive field (Figure 2F) is the basic unit in the visual system responsible for receiving and processing light signals. It is generated using a simple model through the additive combination of “ON” and “OFF” subunits, with the spatial separation of subunits derived from a unimodal distribution. The same band-pass filtered white noise used to generate orientation and visual dominant maps was also used to create controls for the separation of “ON” and “OFF” receptive fields (δ_on–off) and the relative strength of “ON” and “OFF” receptive fields (w_on–off).

Predicted phosphenes are produced by summing receptive field contours at each cortical location, with intensity weighted according to the level of electrical stimulation. Consequently, the stimulation intensity at an electrode directly influences the perceived phosphene’s brightness and size.


▷ Figure 2. Schematic of the cortical model (A) Transformation from visual space to cortical surface. (B) Orientation pinwheel map. (C) Visual dominant columns. (D) Spatial separation of ON and OFF subunits. (E) Relative strength of ON and OFF. (F) Receptive field size.

(3) Phosphene Thresholds and Brightness as Functions of Electrical Stimulation Temporal Characteristics

Ione Fine’s team compared the model’s predictions with data on current amplitude thresholds and brightness ratings measured across various pulse sequences. The model accurately described how pulse sequences are converted into perceptual intensity, successfully predicting phosphene thresholds and brightness ratings across different pulse parameters, electrode locations, and electrode sizes. This suggests that the model can reliably estimate patients’ perceptual experiences regardless of changes in stimulation frequency, pulse width, or the specific location and size of electrodes.

*Phosphene threshold refers to the minimum current intensity required to produce a visible phosphene, while brightness rating is patients’ subjective evaluation of the phosphene’s brightness.

(4) Relationship Between Phosphene Size, Current Amplitude, and Eccentricity

Additionally, the model developed by Ione Fine’s team 1 successfully predicted how phosphene size varies with current amplitude and visual field eccentricity. The study found that as current amplitude increases, phosphene size also increases, closely aligning with patient perception data. This indicates that current amplitude is a key factor in determining phosphene size. Furthermore, the model demonstrated that phosphene size grows with visual field eccentricity, meaning that electrical stimulation at the periphery produces larger phosphenes than those in the central area.

(5) Shape Recognition

In the study, the team compared perceptual experiences generated by simultaneous versus sequential electrode stimulation. The results showed that when multiple electrodes were stimulated simultaneously, the model failed to correctly recognize complete letter shapes, mirroring patients’ actual experiences. However, when electrodes were stimulated sequentially in the order of writing, the model accurately recognized letter shapes. This highlights a critical point: simultaneous stimulation makes it difficult to group and interpret phosphenes correctly, whereas sequential stimulation facilitates the formation of recognizable shapes.

This phenomenon may arise from the absence of the Gestalt effect in the model. Gestalt psychology posits that the whole is greater than the sum of its parts, meaning our perceptual system integrates dispersed phosphenes into meaningful wholes. However, since the model does not account for electric fields or complex neural spatiotemporal interactions, phosphenes cannot be effectively grouped during simultaneous stimulation, leading to challenges in shape recognition. Sequential stimulation, by temporally separating stimuli, reduces interference between phosphenes and allows the perceptual system to better integrate information and recognize shapes accurately.

(6) Using “Virtual Patients” to Predict Perceptual Outcomes of New Devices

The ability of Ione Fine’s team’s model to replicate extensive data suggests it can provide insights into the potential perceptual experiences of new technologies—one of the primary uses of the “virtual patient” model.


Figure 3 examines the perceptual experiences produced by different electrode sizes using the “virtual patient” model. In Figure 3A, the simulated array employs extremely small electrodes (tip areas ranging between 500-2000 μm²). The model predicts visual phosphenes that align with preliminary experimental data: phosphenes from adjacent electrodes cannot be clearly distinguished. This matches patients’ pre-experimental observations, indicating that when electrode stimulation intervals range from 0.4 to 1.85 millimeters, both single and multiple electrode stimulations produce irregularly shaped phosphenes. This implies that extremely small electrodes may simultaneously stimulate neuronal populations tuned to similar directions, resulting in elongated or complex phosphene structures.

Figures 3B and 3C further explore the impact of electrode size on patient perception. For small electrodes with limited current diffusion (cortical tissue stimulation radius less than 0.25 millimeters), the model predicts complex phosphene structures, as shown in the upper part of Figure 3B. At this stage, electrode size has minimal impact on phosphene appearance or size. When the electrode radius ranges from 0.25 millimeters to 1 millimeter, phosphenes begin to resemble “Gaussian blobs,” but their size is still primarily determined by receptive field size rather than the stimulated area. Only when the electrode radius exceeds 1 millimeter does electrode size significantly influence phosphene size.

Crucially, the model indicates that throughout the visual field, receptive fields impose a physiological “lower limit” on phosphene size. Specifically, reducing the stimulation area’s radius below 0.5 millimeters may not significantly enhance vision and might instead render phosphenes difficult to interpret.


▷ Figure 3. Using “Virtual Patients” to Predict Perceptual Outcomes
(A) Simulated perception with very small deep electrode arrays. The lower left panel shows electrode positions and sizes in the array. The upper right panel displays example perceptions from three individual electrodes. The lower panel shows predicted outcomes when electrode pairs are stimulated simultaneously. (B, C) Simulated predicted phosphene shapes and sizes under different electrode sizes and cortical locations. The narrow shaded area in panel C represents the 5–95% confidence interval.

Similarly, the question arises: does having more electrodes lead to better vision restoration?

Ione Fine’s team simulated perceptual outcomes for three different electrode array configurations through Figure 4, revealing the complexity of this issue.

Figure 4A shows electrodes arranged in a regular pattern within the visual space. This arrangement produces sparse, small phosphenes in the foveal region (the retina area responsible for high-resolution vision), significantly underestimating its perceptual capability. This means that despite an increased number of electrodes, the phosphene distribution in the foveal region is not dense enough to fully utilize its high-resolution potential.


▷ Figure 4. Comparison of Simulated Electrode Array Configurations
(A) Regularly spaced electrodes in the visual field. (B) Regularly spaced electrodes on the cortical surface. (C) ‘Optimal’ spacing.

Figure 4B shows electrodes arranged regularly on the cortical surface. This configuration leads to an excessive concentration of electrodes in the foveal region, causing significant overlap between receptive fields. Consequently, these overlapping receptive fields do not substantially improve resolution. In fact, electrodes near the foveal region almost project to the same visual spatial location, making it difficult for patients to perceive positional changes even with slight electrode movements. This finding challenges the traditional view that the extensive expansion of the foveal region in V1 supports higher spatial sampling capabilities.

Figure 4C presents an “optimal” electrode configuration, where electrode spacing is designed so that the center-to-center distance of stimulated phosphenes maintains a fixed ratio with phosphene size. Since receptive field size increases linearly with visual field eccentricity and cortical magnification changes logarithmically, this configuration results in a more dispersed electrode distribution in the foveal region compared to peripheral areas. The results indicate that electrodes should be more sparsely arranged in the foveal region rather than densely packed, contrary to common intuition.

The simulation results from Ione Fine’s team demonstrate that excessively dense electrode arrangements in the foveal region do not enhance perceptual resolution. Instead, based on neurophysiological constraints, appropriately distributing electrode positions and spacing is essential to maximize the perceptual efficacy of visual cortical prosthetics.
 

3. Insights from the “Virtual Patient” Model

The “virtual patient” model tackles a fundamental challenge in vision restoration: predicting patients’ perceptual experiences before device implantation. This model enables researchers to evaluate how different electrode designs and stimulation parameters might impact visual perception, all without actual implantation.

Additionally, the model challenges a common misconception: increasing electrode numbers or reducing their size does not necessarily improve visual perception and may actually complicate perceptual experiences.

Ione Fine’s model identifies three main factors limiting the spatial resolution of cortical implants: cortical magnification, receptive field structure, and electrode size. Receptive field size is strongly linked to cortical magnification. In most cortical areas, the receptive field area is roughly to the -2/3 power of cortical magnification. This relationship supports prior findings by Bosking and colleagues, who noted that patient-drawn phosphene sizes could be predicted by cortical magnification. In the foveal region, where cortical magnification peaks, receptive fields are smallest, with radii ranging from 0.02 to 0.5 degrees.

Data and simulations show that for a fixed electrode size, phosphene size increases linearly with eccentricity. When electrode radii are smaller than 0.25 millimeters, this relationship primarily reflects the increase in receptive field size with eccentricity. For larger electrodes, both cortical magnification and the size of the stimulated cortical area become significant factors in determining phosphene size.

With smaller electrodes, receptive field size primarily limits visual acuity. If neurons with very small receptive fields could be selectively stimulated, closer electrode spacing in the foveal region could enhance spatial resolution. However, human vision can resolve details far finer than a single receptive field width. This fine discrimination depends on interpreting complex response patterns across neuron populations with diverse receptive fields, rather than on individual neurons alone.

In summary, Ione Fine’s simulations suggest that, in the near future, the spatial resolution of visual cortical prosthetics will likely be constrained more by the neurophysiological structure of the visual cortex than by engineering limits. This implies that enhancing vision restoration outcomes will require a deeper understanding and application of visual cortex neurophysiology, rather than simply increasing electrode numbers or reducing their size.
 

4. Current Limitations of the “Virtual Patient” Model

The advent of the “virtual patient” model has transformed our understanding of retinal implant surgeries. While modeling techniques have long simulated the effects of electrical stimulation on local tissues, such as current diffusion from electrodes, predicting perceptual outcomes requires expanding virtual models to include core physiological principles.

The “virtual patient” model developed by Ione Fine’s team has effectively predicted a range of perceptual outcomes from cortical electrical stimulation, suggesting it may serve as a useful approximation for future retinal or cortical implants. This model could guide future research using more reliable subject data.

Despite its utility, the current model has several limitations. First, it uses current amplitude as an input parameter, though a more precise approach would employ current density (current intensity divided by electrode area) to better capture current distribution and its effects within tissue.

Second, the model assumes that electrodes are perfectly aligned with the cortical surface. In reality, electrodes may not sit flush, and even slight tilts can result in only the edges driving neural responses effectively.

Third, the model serves only as an approximation and may not be suitable for long-term stimulation protocols. Fourth, it excludes electric fields and nonlinear neural interactions. Fifth, it assumes perception is a simple average across receptive fields. An alternative could involve representing each neuron by its “optimal reconstruction filter”—the cell’s specific role in reconstructing natural images within the neural network.

Finally, the current model only includes the V1 cortical area. Due to the cortical surface’s configuration, electrode implantation is much easier at higher-level visual areas, such as V2 or V3. Many components of the model, including the transformation from visual space to the cortical surface, could be easily extended to these higher visual areas. The model could also be expanded to incorporate the receptive fields of V2 or V3 neurons. However, the complexity of V2-V3 receptive field structures, combined with a lack of cortical stimulation data from V2 or V3 electrodes, means that any such extension of the model remains highly speculative at present.

5. Future Outlook

In the future, “virtual patient” models may serve multiple critical functions. For researchers and companies, they can quantitatively evaluate our level of understanding of this technology. Given the difficulties in collecting behavioral and cortical data, model-driven approaches can help guide experiments to produce the most meaningful insights, thereby optimizing the allocation of research resources.

Another important application is predicting the visual quality a specific implant might provide. In this study, Ione Fine’s team used qualitative assessments of perceptual quality to evaluate different array configurations. A more rigorous approach might involve subjective interpretation; for example, asking visually normal individuals to perform perception tasks with simulated prosthetic vision. Alternatively, simulations could serve as inputs for decoders trained to reconstruct the original image. Recent cortical simulators display characteristics similar to those in more complex models.

Finally, “virtual patients” can guide the development of new technologies. The current model, for example, suggests that small electrode sizes and dense implantation in the foveal region provide limited advantages. Additionally, “virtual patients” could help generate optimized training sets for deep learning-based prosthetic vision, aiding in the identification of optimal stimulation patterns for existing implants. Similar retinal stimulation models are currently used to simulate and enhance prosthetic vision in virtual reality environments by generating preprocessed training data for deep learning.

For agencies like the FDA and insurance providers, these models can inform essential visual tests for device evaluation, helping to establish scientific standards that ensure the safety and effectiveness of new vision restoration technologies. Finally, for surgeons and patients’ families, these models provide more realistic expectations for perceptual outcomes.

]]> https://admin.next-question.com/uncategorized/blind-sight-surpass/feed/ 0 “Fantasizing” May Not Be Able to Produce “Truth” https://admin.next-question.com/science-news/fantasizing-produce-truth/ https://admin.next-question.com/science-news/fantasizing-produce-truth/#respond Fri, 27 Dec 2024 03:43:35 +0000 https://admin.next-question.com/?p=2642 The learning capabilities of machines have become commonplace for us, and some may even believe that “machines can learn anything as long as there is enough data.” However, the learning we typically discuss is based on a framework of “external observation.” In addition to common learning scenarios such as reading and attending lectures, there is another type of learning that occurs deep within our minds (internal thinking). This is known as “learning by thinking (LbT).”

Consider that this phenomenon is ubiquitous in daily life: scientists gain new insights through thought experiments, drivers discover how to navigate around obstacles through mental simulation, or writers acquire new knowledge while trying to express their ideas. In these examples, learning occurs without any new external input. With the advancement of computer science, the LbT phenomenon has become increasingly pronounced.

For instance, GPT-4 corrected a misunderstanding during its explanation process and arrived at the correct conclusion without any external feedback. Similarly, when large language models (LLMs) are prompted to “think step-by-step” or simulate a chain-of-thought [1], they can provide more accurate answers without external corrections.


▷ Screenshot of a GPT-4 conversation. Source: GPT

However, LbT is often considered a “paradox.” On one hand, in a certain sense, the learner does not acquire new information—they can only utilize elements that already exist in their mind. On the other hand, learning does indeed occur—the learner gains new knowledge (such as the answer to a math problem) or new abilities (such as the capacity to answer new questions or perform new reasoning).

Recently, Professor Tania Lombrozo from the Concept and Cognition Laboratory at Princeton University’s Department of Psychology published a review in Trends in Cognitive Sciences that offers a solution to this paradox [2]. By analyzing four specific modes of learning—explaining, simulating, comparing, and reasoning—the article reveals the similar computational problems and solutions underlying both human and artificial intelligence. Both leverage the re-representation of existing information to support more reliable reasoning. She concludes that, as resource-limited systems like humans, LbT helps us extract and process necessary information according to current needs without constantly generating new information.

So, can the results produced by this almost “fantasizing” form of learning be considered “truth”? Before drawing a conclusion, let us first review the evidence supporting LbT.


▷ Lombrozo, Tania. “Learning by thinking in natural and artificial minds.” Trends in Cognitive Sciences (2024).
 

1. Four Typical Modes of Learning by Thinking

Let us first explore four instances of LbT: learning through explaining, simulating, analogical reasoning, and reasoning. Of course, LbT is not limited to these four modes—on one hand, there are other forms of LbT, such as learning through imagination; on the other hand, these learning modes can be further subdivided. For example, when discussing learning through explaining, there is a distinction between teleological explanations and mechanistic explanations.

In fact, due to the intricate relationships between the learning process and other cognitive mechanisms such as attention and memory, it is challenging to categorize learning fundamentally. Instead, the four learning processes we discuss help us understand the ubiquity and strengths of LbT. This allows for parallel comparisons between human thinking and artificial intelligence.


▷ Figure 1. Different Types of Learning. Source: [2]
 
(1) Learning through Explaining

In a classic study [3], researchers found that “high-achieving” students employed different strategies compared to “low-achieving” students when learning material. High-achieving students tended to spend more time explaining the text and examples to themselves. Subsequent studies that prompted or guided students to engage in explanations showed that explanatory learning indeed enhanced learning outcomes, particularly for questions that extended beyond the learning material itself. When external feedback is absent during the learning process, explanatory learning becomes a form of LbT. This learning mode can be divided into two categories: corrective learning and generative learning.

In corrective learning, learners identify and improve flaws in their existing representations. For example, the phenomenon known as the “illusion of explanatory depth (IOED)” demonstrates that people often overestimate their understanding of how devices work. It is only after attempting to explain that they become aware of the limitations of their understanding [4].

Generative learning refers to the process of learners constructing new representations through explanation. For instance, when learning new categories, participants who were prompted to explain were more likely to generate abstract representations and identify broad patterns within examples.

A similar approach is applied in AI research, where AI systems generalize by generating self-explanations that extract information from training sets. Recent studies have also shown that deep reinforcement learning systems that generate natural language explanations alongside task answers perform better in relational and causal reasoning tasks than systems that do not generate explanations or those that use explanations as input. These systems do not rely solely on simple features but can summarize generalizable information from complex data.


▷ Source: GPT

(2) Learning through Simulating

Imagine three gears arranged horizontally. If the leftmost gear rotates clockwise, in which direction will the rightmost gear turn?

Most people would solve this problem through “mental simulation,” constructing a mental image of the gears’ rotation. Throughout the history of science, thought experiments in many fields have been classic examples of mental simulation. For example, Einstein explored relativity by simulating the movement of photons and trains, while Galileo studied gravity by simulating the process of objects falling. These mental simulations and thought experiments provide profound insights without relying on new external data.

Similar to explanatory learning, mental simulation can be either corrective or generative.

A corrective example of mental simulation involves a study where participants engaged in two types of thought experiments simultaneously—one aligning with Newtonian mechanics intuition and the other inducing erroneous dynamic thinking (e.g., believing that objects require continuous force to remain in motion). These conflicting thought experiments led participants to sometimes make correct judgments based on Newtonian mechanics and at other times be misled by dynamic reasoning. However, after the experiment, participants corrected their initial misconceptions through thought experiments and no longer supported incorrect judgments related to dynamic reasoning.

An example of generative mental simulation is found in causal reasoning. When determining whether one event causes another, people often engage in “counterfactual simulation.” For instance, in an experiment where the first ball hits the second ball, altering its trajectory to reach a target, participants would simulate what would happen if the first ball had not hit the second ball, thereby assessing the causal relationship.

Similar simulation processes are widely applied in the AI field. For example, in deep reinforcement learning, model-based training methods use environmental representations to generate data for training policies, which closely resemble human mental simulations. Some AI algorithms approximate optimal solutions by simulating multiple decision sequences. In both humans and AI, simulations generate “data” that provide essential input for learning and reasoning.


▷ Source: Corey Brickley

(3) Learning through Analogical Reasoning and Comparison

When constructing the theory of natural selection, Darwin analogized artificial selection to biological evolution, thereby deducing the mechanism of variation in natural selection. Such analogical reasoning is common in scientific research, especially when researchers have some understanding of both objects being compared. Through analogical thinking, new insights can emerge, supporting “learning by thinking.”

Learning through analogy does not rely entirely on autonomous thinking but also incorporates external information. Consequently, the learning outcomes reflect both the information provided by the researcher and the participant’s analogical reasoning abilities. However, some studies isolate the impact of analogical thinking by providing all participants with the same analogical information, while only some are prompted to use this information when solving new problems.

For example, a study on mathematical learning showed that the more participants were reminded to compare samples, the less likely they were to be misled by superficial similarities in problems, demonstrating the corrective effect of analogical learning. Another experiment required participants to identify similarities and differences between two groups of robots. The results showed that only participants who actively engaged in comparisons were more likely to discover subtle rules, illustrating the generative effect of analogical thinking.

Analogical reasoning has also attracted attention in the field of artificial intelligence. Similar to human experiments, most demonstrations of analogical reasoning in AI are not purely LbT; instead, AI systems are often required to solve problems related to source analogies. However, recent research indicates that machines can construct analogies through their own thinking or knowledge even without provided source analogies [4]. In tasks such as mathematical problem-solving and code generation, the most effective prompts involve asking large language models (LLMs) to generate multiple related but diverse examples, describe each example, and explain their solutions before providing solutions to new problems. This process likely integrates reasoning and explanation. Such analogical prompts outperform many state-of-the-art LLM performance benchmarks and show some similarity to human learning outcomes under comparison prompts.

Through analogical reasoning, not only can errors be corrected, but deeper levels of thinking can be stimulated, leading to the discovery of new concepts or rules. This is an important learning mechanism both in human learning and artificial intelligence.


▷ Source: NGS ART DEPT

(4) Learning through Reasoning

Even seemingly simple reasoning processes require accurate information and logical processing. For example, I might know that today is Wednesday and also know that “if today is Wednesday, I should not park on a certain campus,” but for various reasons, I might overlook this logical relationship and mistakenly park in a no-parking zone. This illustrates that even when logic holds, additional attention and processing capacity are needed during reasoning.

More complex reasoning requires deeper thinking. For instance:
Premise 1: “Everyone loves anyone whom they have ever loved.”
Premise 2: “My neighbor Sarah loves Taylor Swift.”
Conclusion: “Therefore, Donald Trump loves Kamala Harris.”

In reality, such a reasoning conclusion might be difficult to accept, indicating that effective reasoning requires not only logical deduction but also reflection and practical judgment.

Reasoning can also play a corrective role [4]. In one study, participants evaluated the arguments for reasoning problem answers without knowing that these arguments came from their own previous responses. The results showed that two-thirds of the participants were able to correctly reject their own previously invalid reasoning. This type of corrective reasoning plays a crucial role in recognizing erroneous intuitions. For example, in the classic “Cognitive Reflection Test (CRT)” question “The bat and the ball”: the total price of the bat and ball is $1.10, and the bat costs $1.00 more than the ball. The intuitive answer is $0.10 for the ball, but the correct answer is $0.05. Reasoning to the correct answer requires overcoming the initial intuitive response and engaging in more rigorous thinking.


▷ Source: mannhowie.com

In the field of artificial intelligence, traditional symbolic reasoning architectures typically rely on explicit rules or probabilistic calculations. In contrast, the reasoning capabilities of deep learning systems (such as large language models) are still developing. When solving complex problems, prompting LLMs to engage in step-by-step reasoning to overcome error tendencies in direct prompting has proven more effective. This mode of reasoning is particularly advantageous for handling high-difficulty tasks, showcasing the potential of AI in the domain of reasoning.
 

2. Unraveling the Mysteries of Learning by Thinking (LbT)

In the preceding sections, we explored how natural and machine brains learn through thinking. Although their operational mechanisms differ, both face the same fundamental question: Why can thinking itself facilitate learning?

Heinrich von Kleist, a poet and playwright, once discussed the benefits of learning by speaking—by expressing our thoughts to others, we can clarify and develop our ideas in the workshop of reasoning. He also proposed the thinking is learning approach to explain the LbT paradox: what is being learned is not our individual self, but a specific state within our minds. In other words, learning in LbT does not involve creating entirely new knowledge but making existing knowledge accessible. Thinking can serve as a source of learning because the foundational knowledge already exists in our minds, even if we are not consciously aware of it.

From a cognitive perspective, the process of learning by reasoning involves the learner combining two premises to derive a conclusion that is logically valid but not explicitly recognized. Although the conclusion is already embedded within the premises, it is only through the reasoning process that the conclusion becomes apparent, thereby constituting learning. In this process, reasoning is not merely a mechanical combination of information; it also generates a new representation that exists independently of the premises.


▷ Source: Caleb Berciunas

Applying this concept to learning by explaining introduces greater complexity. During the explanation process, learners often do not clearly identify what constitutes an explicit premise, making the accessibility of representations particularly crucial. Different types of input can affect the extraction of representations. For example, when dealing with addition, whether the input is in Arabic numerals or Roman numerals significantly influences our perception of input features (such as parity) and can alter accessibility conditions through different algorithms (such as explaining, comparing, and simulating). This supports representational extraction, creating entirely new representations with new accessibility conditions.

Representational extraction refers to the formation of a specific cognitive structure through thinking or reasoning, which possesses different accessibility conditions. Once a representation is extracted, it becomes a premise that subsequently limits the scope of output.

This is particularly evident in learning by explaining—when learners engage in explanations, they tend to select explanations with fewer root causes. In other words, they are more inclined to consider certain causal hypotheses as more reasonable and superior to others. Such selections create implicit premises that restrict the space for subsequent reasoning. For instance, when learners develop a preference for a particular explanation, this preference itself constitutes a new premise, thereby influencing the conclusions drawn thereafter.


▷ Action of Thinking through Reasoning and Explaining. Source: [2]

Understanding the differences in accessibility and the role of representational extraction helps us extend the logic of reasoning-based learning to other learning forms, such as explaining, comparing, and simulating. The cognitive processes in each form can be seen as continuously extracting new representations and establishing new reasoning premises, thereby progressively advancing the learning process. However, this does not resolve another issue: Why should we expect conclusions derived from explanations, simulations, or other LbT processes to generate new knowledge, that is, true and factual knowledge?

Of course, the outputs of LbT are not always correct. However, because these processes may be influenced by evolution, experience, or the design of artificial intelligence, we can reasonably assume that LbT at least partially reflects the structure of the world, making it possible to produce relatively reliable conclusions [5]. Even if the outputs are not entirely accurate, they can still guide thinking and actions. For example, when learning through explaining, even if the generated explanations are incorrect, the explanation process itself may improve subsequent inquiries and judgments, such as helping learners identify conflicts between representations or represent a domain in a more abstract manner.

Having understood the mechanisms and potential value of LbT, the next question is: Why is LbT necessary?

We can explain this through an analogy with computer systems. In artificial systems, limitations in memory and processing time determine how much prospective computation can be performed. LbT provides a way to generate novel and useful representations on demand, rather than solely relying on existing learning outcomes. Therefore, it can be assumed that LbT is particularly prevalent in intelligent agents with limited resources (such as time and computational power) [6], especially when facing uncertainties in future environments and goals.

These observations also suggest differences in the role of LbT between natural and artificial thinking. As artificial intelligence gradually overcomes the resource limitations of human thinking or when it deals with problems that do not involve high uncertainty (such as operating in very narrow domains), we anticipate significant differences in how AI and humans perform in LbT processes under these conditions.
 

3. Conclusion

Learning by Thinking (LbT) is ubiquitous: humans acquire knowledge not only through observation but also through methods such as explaining, comparing, simulating, and reasoning. Recent studies have shown that artificial intelligence systems can also learn in similar ways. In both contexts, we can resolve the LbT paradox by recognizing that the accessibility conditions of representations change. LbT enables learners to extract representations with new accessibility conditions and use these representations to generate new knowledge and abilities.

In a sense, LbT reveals the limitations of cognition. A system with unlimited resources and facing limited uncertainty could compute all consequences immediately through observation. However, in reality, both natural and artificial intelligence face limitations in resources and high uncertainty related to future judgments and decision-making. In this context, LbT provides a mechanism for on-demand learning. It makes full use of existing representations to address the environment and goals currently faced by the agent.

Nevertheless, there remain many unresolved mysteries regarding the implementation of LbT in natural and artificial intelligence. We do not fully understand how these processes specifically promote the development of human intelligence or under what circumstances they might lead us astray. Uncovering the answers to these questions requires not only deep thinking but also the support of an interdisciplinary cognitive science toolkit.

]]> https://admin.next-question.com/science-news/fantasizing-produce-truth/feed/ 0 How to Unify the Measurement of Consciousness in All Things https://admin.next-question.com/features/unify-the-measurement/ https://admin.next-question.com/features/unify-the-measurement/#respond Sun, 22 Dec 2024 20:27:02 +0000 https://admin.next-question.com/?p=2628

Jean-Dominique Bauby, who once served as the editor-in-chief of the French fashion magazine Elle, describes in his book The Diving Bell and the Butterfly the sensation of a patient with locked-in syndrome being imprisoned in an immobile body, based on his own experiences.

Before this, Alexandre Dumas had already depicted a scenario in The Count of Monte Cristo where a fully intact consciousness is buried within an immobile body—Monsieur Noirtier de Villefort cannot speak or move his limbs, but through eye movements, he tries to prevent a murder and an undesirable marriage.

These real or fictional cases reflect a widespread concern about the “distribution of consciousness.”


▷ The Diving Bell and the Butterfly film adaptation

In fact, “consciousness” is a concept that has only flourished in recent times. Although research related to consciousness can be traced back to ancient Greece, and even hints of consciousness and the subconscious can be found in primitive cave paintings, research focusing specifically on consciousness itself has a very short history.

In the early 1980s, the term “consciousness” was still taboo in serious scientific publications. Many researchers believed that defining consciousness was outdated and ambiguous, and that the use of the term “consciousness” added no value to psychology.

It was not until the late 1980s that consciousness research took a turn for the better*. The intuitive ideas about the “distribution of consciousness” in folk psychology have also received increasing attention [1]. We usually think that people in a waking state are conscious, while in states of intoxication, mental illness, anesthesia, coma, vegetative state, and brain death, the degree of consciousness progressively decreases.

For research on the shift in attitudes toward consciousness, refer to Stanislas Dehaene’s Consciousness and the Brain. This observation is partly based on Dehaene’s personal experience and partly on his academic research.

In addition to folk psychology, clinical practice also pays significant attention to the issue of the distribution of consciousness. Clinically, diagnosing whether a patient is brain-dead or in a vegetative state requires measuring the degree of consciousness. Furthermore, from a broader perspective, how do we determine the existence of consciousness? Which animals can be included in the “consciousness club,” and in what ways do they possess consciousness? For example, is the consciousness of an octopus dispersed or unified? Do organelles or artificial intelligence systems, such as large language models with strong communicative abilities, possess consciousness? These studies from different perspectives and levels ultimately point to the investigation of the nature of consciousness—is consciousness unified? Is consciousness multiply realizable (e.g., through physical, computational, or modeling means)?

Currently, there are roughly 22 theories of consciousness, and to adjudicate the distribution of consciousness, we need to base our judgments on a universally accepted theory of consciousness. However, most theories of consciousness are centered on human consciousness, making it very difficult to explain the distribution of consciousness in animals.

To address this, Tim Bayne, Anil Seth, and others have taken a different approach. In the paper “Tests for Consciousness in Humans and Beyond,” they attempt to provide a framework for consciousness tests based on the key characteristics of consciousness testing and offer strategies for proving the validity of these tests. This builds a bridge between consciousness assessments and theoretical frameworks, ultimately leading to a deeper understanding of the nature of consciousness. This testing framework can not only be used to evaluate and revise existing consciousness tests but also guide the construction of new ones.


▷ Bayne, Tim, et al. “Tests for consciousness in humans and beyond.” Trends in Cognitive Sciences (2024).

 

1. From the Hard Problem to the Real Problem

In Consciousness and the Brain, Stanislas Dehaene identifies three key elements that have revitalized consciousness research: a more precise definition of consciousness; discoveries that allow for the experimental manipulation of consciousness; and the academic community’s renewed emphasis on studying subjective phenomena. These factors have collectively helped consciousness research emerge from its “winter.” The advent of new research tools such as electroencephalography (EEG), functional magnetic resonance imaging (fMRI), and magnetoencephalography (MEG) has accelerated this progress.

The three elements emphasized by Dehaene are directly related to the core features of consciousness research. Among them, “experimentally manipulable consciousness” primarily focuses on the accessibility of consciousness and first-person reports. “Subjective phenomena” refer to the “what it is like” subjective characteristics highlighted by Thomas Nagel in his essay “What Is It Like to Be a Bat?” or what Ned Block describes as phenomenal consciousness.

Additionally, the characteristics pointed out by Dehaene reveal the challenges in consciousness research, specifically the distinction between the “hard problem” and the “easy problem” of consciousness as articulated by David Chalmers. The hard problem concerns how phenomenal consciousness arises from its physical basis and how to account for the reality of phenomenal consciousness in the universe. In contrast, the easy problem addresses how physical systems generate responses with specific functional or behavioral performances. Simply put, the hard problem deals with how a physical brain can possess phenomenality, while the easy problem examines how particular types of conscious experiences are produced.

Although addressing the easy problem is not without difficulty, it is relatively straightforward because there is no conceptual gap between physical systems and their functional or behavioral manifestations. However, even if all easy problems are solved, the hard problem remains unresolved. As George Mather stated, the “hardness” that Chalmers refers to implies that solving this problem seems impossible.

Subsequently, the philosophy of mind and consciousness science have pursued the mysteries of consciousness along different paths. The philosophy of mind has developed a series of theories, such as physicalism and functionalism, in an attempt to provide answers to the hard problem. Consciousness science, on the other hand, avoids direct discussion of phenomenality and approaches the subject through first-person reports and the accessibility of consciousness.

For example, the Global Workspace Theory and Higher-Order Thought Theory focus on the function or behavioral manifestations of consciousness. The former posits that the core of consciousness lies in the accessibility of information within a global workspace, while the latter emphasizes the reference of higher-order representations to first-order representations, reflected in the brain as the prefrontal cortex referencing other brain regions. Even if subjective reports of conscious phenomena largely align with the explanations provided by these theories, this consistency only indicates some form of correlation and does not directly explain the phenomena themselves.


▷ For Global Neural Workspace Theory, see: Is a Grand Unified Theory of Consciousness Coming?

Anil Seth points out that we should neither be confined to debates over the hard problem nor neglect the phenomenality of consciousness. Instead, we should shift our attention to the real problem of consciousness. The real problem of consciousness attempts to integrate phenomenality, measurement, and explanations of consciousness to explain, predict, and control phenomenally characterized experiences. Anil Seth believes that we can anticipate that answers to the real problem will ultimately allow the hard problem to “fade into the metaphysical fog.” This shift in perspective provides us with a new path for studying consciousness. The transition from the hard problem to the real problem places consciousness testing and theoretical frameworks in a more central role, with the two mutually clarifying and corroborating each other, thereby contributing to a deeper understanding of consciousness.

 

2. Measurement of Consciousness

When discussing consciousness, we often consider related concepts such as wakefulness, attention, intelligence, voluntary behavior, and self-regulation. Generally speaking, individuals in a state of wakefulness are deemed conscious. Vegetative patients, while still experiencing day-night cycles, do not clearly possess consciousness. The emergence of conscious mental states requires the operation of attention. However, even sub-threshold operations of attention contribute to the manifestation of mental states. Therefore, attention is not synonymous with consciousness. Bayne et al. point out that consciousness tests should not focus solely on the abilities associated with consciousness mentioned above but should instead target phenomenal consciousness directly. Abilities that covary with consciousness, such as attention and perceptual organization, are at best tools for guiding the search.

The scope of consciousness research cannot be too broad or too narrow. Humans, capuchin monkeys, mice, and octopuses exhibit different manifestations of consciousness; these different manifestations are neither modeled on human consciousness nor derived from it. In fact, human consciousness represents just a small part of the broader landscape of consciousness. Therefore, defining the boundaries of consciousness research must consider these diverse manifestations, which in turn require distinct tests.


▷ Examples of Consciousness Tests.

 

3. Standard Consciousness Tests

The term “consciousness” typically refers to a state of consciousness, but this does not presuppose the existence of a universally applicable and definitive consciousness test. Methods for assessing consciousness are diverse, each with its own focus. Some consciousness tests concentrate on the general features of consciousness without conveying information about its content, such as the Perturbational Complexity Index (PCI) test. These types of tests focus on neural integration and differentiation rather than on individual behavioral responses or reactions to mental imagery.

Another category of consciousness tests focuses on specific conscious contents or mental abilities sufficient to trigger consciousness. These include bodily sensations (such as pain and smell), autonomous responses (generating and maintaining mental imagery), the ability to discriminate between sub-threshold and supra-threshold stimuli, and various types of learning abilities. The interpretation of these tests depends on whether the selected mental abilities are strong indicators of consciousness.

Faced with such diversity, how can we determine whether a particular consciousness test is applicable to a specific group? Why can the results of a consciousness test be regarded as reliable evidence of the presence or absence of consciousness? Additionally, how should we understand the relationships between different types of consciousness tests? To address these questions, Bayne et al. propose a “four-dimensional space of consciousness tests,” aiming to provide a more systematic framework for understanding consciousness tests.

 

4. The Four-Dimensional Space of Consciousness Tests

The first dimension of the four-dimensional space emphasizes the limited applicability of consciousness tests. In other words, different consciousness tests should be designed for target groups exhibiting varying manifestations of consciousness. Some consciousness tests may be suitable only for humans and other primates; others might apply to a broader range of mammals, and still, others could be utilized for more diverse entities, including evolving biological systems and artificial intelligence. Ideally, a universally applicable consciousness test would encompass all types, but this is unlikely to be achievable in the short term.

Furthermore, the goal of consciousness tests is to accurately determine whether an individual possesses consciousness. This means that the tests should neither incorrectly classify unconscious individuals as conscious (false positives) nor mistakenly identify conscious individuals as unconscious (false negatives). The former is referred to as the specificity of the consciousness test, measured by a low false positive rate; the latter is known as sensitivity, measured by a low false negative rate. These two concepts—specificity and sensitivity—constitute the remaining two dimensions of our discussed four-dimensional space.


▷ Example: Sensitivity refers to the proportion of truly unconscious individuals who are correctly diagnosed as unconscious. Specificity refers to the proportion of truly conscious individuals who are correctly diagnosed as conscious. More specifically, false positives refer to individuals who have been incorrectly diagnosed as unconscious; False negative refers to a person who is mistakenly diagnosed as conscious.

Take, for example, the command-following test for human consciousness. This test is commonly used with patients who exhibit no observable behavioral responses. By monitoring neural activity in response to mental commands, we can assess whether they possess consciousness. In many clinical cases, this test is considered a reliable indicator of the presence of consciousness and is also a crucial tool for determining minimal consciousness states.

However, there are exceptions to this test. Some epilepsy patients may pass the command-following test, but their responses might merely be unconscious habitual actions, such as automatic walking, which do not conclusively demonstrate consciousness. Additionally, conscious patients might fail the test if they do not hear or understand the commands.

This implies that even consciousness tests are not always entirely accurate. The specificity (the ability to correctly exclude unconsciousness) and sensitivity (the ability to correctly identify consciousness) of a test do not always align. A test with high specificity but low sensitivity might miss genuinely conscious patients, while a test with high sensitivity but low specificity might incorrectly classify unconscious patients as conscious.

Specificity and sensitivity are based on the limited applicability of consciousness tests and depend on the specific target group. We cannot expect high specificity for a particular group to transfer to another type of target group. Low-specificity consciousness tests are not entirely useless; they may indicate that current research on the distribution of consciousness within specific groups is misguided.

As previously mentioned, the false positive rate and false negative rate are important indicators of specificity and sensitivity, respectively. However, these are statistical terms that can only provide inductive explanations. Therefore, the final dimension of consciousness tests is rational confidence. This is a second-order criterion that assesses whether the specificity and sensitivity are reasonable, attempting to link scientific tests with intuitive judgments about consciousness in folk psychology*. Tests with high rational confidence better align with our everyday understanding of consciousness, whereas tests with low rational confidence may yield results that starkly contrast with our intuitions, making their outcomes more suggestive than definitive. Establishing this dimension essentially measures the overall efficacy of a consciousness test.

Achieving high rational confidence is challenging in the short term, especially for consciousness tests designed to assess the consciousness levels of other animals, given our limited folk psychological judgments about the distribution of consciousness in non-human species.


▷ Scope of Consciousness Tests: The applicability of different consciousness tests (rows) to various target groups (columns). A plus sign (+) indicates that the test may meaningfully apply to a specific population (although its specificity/sensitivity might be low) and may require some modifications. A dash (−) signifies that the test is not applicable or irrelevant to a particular population. A question mark (?) suggests that the test might be applicable but requires further development to determine its suitability. Lastly, a combination of a plus sign and a question mark (+?) indicates that while the test can be applied, it is unclear what the results signify.

 

5. How to Prove the Validity of Consciousness Tests

How can we demonstrate the validity of a consciousness test and assess its rational confidence? How can we determine the specificity and sensitivity of consciousness tests? First and foremost, we cannot simply conclude that a consciousness test is valid by proving that a particular group possesses consciousness. Instead, we must first establish that the consciousness test is usable and that its results are reliable before we can determine the degree of consciousness present in the target group. Therefore, demonstrating the validity of consciousness tests is crucial.

In this context, Seth, Bayne, and others outline three strategies for proving the validity of consciousness tests in their article. These strategies are not mutually exclusive; they merely focus on different core factors. The redeployment strategy assumes the validity of certain consciousness tests and uses this assumption as a foundation to discuss the validity of other related tests. However, the validity of the foundational tests themselves remains to be demonstrated. The theory-based strategy offers a more robust foundation compared to the redeployment strategy. However, since the field of consciousness theories has not yet reached a definitive conclusion and different consciousness theories support different consciousness tests, there is no unified answer as to which theory should serve as the basis for consciousness tests. Compared to the first two, the iterative natural kind strategy may be a better choice. This strategy views consciousness as a natural kind, providing a basis for further reasoning and generalization. Its iterative nature creates a positive feedback loop between consciousness tests and consciousness theories.

Strategy One: The Redeployment Strategy

The redeployment strategy posits that there are already some consciousness tests with broad validity, and other tests to be considered are variants of these existing tests. Therefore, we can extend the validity of existing tests to their variants. In everyday life, we generally acknowledge that overt behavior signifies consciousness, and covert maintenance of mental imagery is a variant of overt behavior. The command-following test uses this as an indicator to provide evidence for the presence or absence of consciousness.

This is a relatively conservative strategy, merely extending existing valid consciousness tests. However, it also involves a risky leap. Firstly, this strategy is entirely empirical, seemingly assuming that existing consciousness tests have sufficiently validated their effectiveness. However, their validity is only empirically grounded and does not constitute rational justification. Secondly, the extension has boundaries; this strategy only allows for the extension of the original test’s content variants and, in principle, cannot be applied to other target groups. Thus, it ultimately only tests the same target group’s consciousness state in different ways. For example, using a variant of the command-following test to measure the consciousness state of human patients does not allow this test to be applied to artificial intelligence systems.

For the redeployment strategy to achieve a robust foundation, it requires proving the validity of the foundational tests through other means or accepting the deflationism of consciousness, which holds that any system that can reliably follow commands is conscious. However, we cannot prove the validity of consciousness independently of consciousness tests, nor do we have a prior definition of consciousness. Both paths are unfeasible.


▷ Katya Dorokhina

Strategy Two: The Theory-Based Strategy

The redeployment strategy either seeks another foundation or embraces some form of deflationism of consciousness. In any case, this strategy faces fundamental challenges. Since the redeployment strategy lacks a theoretical foundation, some have proposed invoking consciousness theories to demonstrate the validity of consciousness tests. The basic premise of this strategy is that consciousness tests that fit well with well-grounded and validated consciousness theories can serve as reliable indicators of consciousness. For example, the Global Workspace Theory supports the Global Effect Test, and the Information Integration Theory inspires the Perturbational Complexity Index (PCI) test.

The theory-based strategy also encounters several challenges. As previously mentioned, no single consciousness theory has gained widespread acceptance; at least 22 consciousness theories are being seriously considered, some of which have variants, and these theories support different consciousness tests*. If these diverse theories could eventually be integrated, the existing disagreements would no longer pose a challenge. However, this is not the case, as different consciousness theories have not shown a trend towards convergence.

There are instances where different theories support the same consciousness test. Farisco and Changeux, in their 2023 article, discuss the fundamental compatibility between the Perturbational Complexity Index test and the Global Neural Workspace Theory. Nonetheless, overall, different consciousness theories support different consciousness tests.

Based on this, Chalmers and others suggest that an integrated framework for consciousness theories might resolve this issue, with different consciousness theories holding varying weights within this framework and multiple consciousness tests being combined accordingly. This idea may be feasible, but in practice, reaching a consensus on the weights of consciousness theories is challenging, much like the unresolved disputes within consciousness theories themselves.

This strategy may also face the challenge of anthropocentrism. Many consciousness theories are built with humans as the reference point, raising questions about how these theories apply to other groups. For instance, the Global Workspace Theory explains the mechanism of human consciousness but does not specify which systems possess a global workspace. There is evidence that fish have structures similar to a global workspace, and the lateral neurons in birds’ tails might perform functions analogous to the human prefrontal cortex, but how to determine their similarity remains unclear. Beyond these animals, how should other animals be treated?

The third challenge involves the bidirectional dependency between consciousness science and consciousness philosophy. To validate a consciousness theory, one needs to rely on consciousness tests; simultaneously, consciousness tests require a specific consciousness theory for support. This situation creates a cycle where a theory depends on tests for validation while also being used to validate the tests, forming an intractable loop.

Strategy Three: The Iterative Natural Kind Strategy

To break free from the non-virtuous cycle between consciousness tests and consciousness theories, Bayne and others propose the iterative natural kind strategy. This strategy views consciousness as a natural kind, where natural kinds are categorized based on essential commonalities (rather than superficial similarities, arbitrary feature combinations, or purely human interests), allowing for the division of the world “at its joints” in a way that reflects the structure of the natural world.


▷ Illustration of the Iterative Natural Kind Strategy

If consciousness is a natural kind, this offers several advantages for research. Firstly, different manifestations of consciousness belonging to the same natural kind share the same essence, which can be discovered through iterative steps, much like scientists have uncovered the essence of heat. The process of revealing the essence of heat began with some pre-theoretical assumptions, which were continually refined through the theory’s unity, simplicity, and explanatory power.

The process of uncovering the essence of consciousness could also result from the iteration between experiments and theories, not only aiding in the construction of more accurate consciousness tests but also helping us better understand the distribution of consciousness. Iteration simultaneously involves retaining and transcending; consciousness theories retain some ideas from folk psychology but are not derived from the initial pre-theoretical assumptions. Instead, they transcend and refine these assumptions. The resulting consciousness theories do not contradict folk psychology but are more rigorous, systematic, and grounded in ongoing empirical evidence.

The natural kind strategy can also effectively address the generalization problem. Since consciousness is a natural kind with a shared essence, starting from the characteristics of some groups, we can first extend to proximate groups with similar features and then to other groups with different features. Theoretically, this is feasible, though during the extension process, the confidence in pre-theoretical assumptions diminishes. Therefore, the validity provided by the natural kind strategy is hierarchical.

However, it remains undecided which features to use for expansion and which groups are closest to human consciousness. Different measurement standards can yield different proximities, whether behavioral, functional, or neurophysiological. For example, individuals with a nervous system exhibit varying degrees of response in states of complete motor inhibition, a phenomenon particularly evident in human clinical conditions, human infants, and some animals. In contrast, artificial intelligence systems and cellular structures differ significantly from humans in this regard. Therefore, the better we understand the underlying mechanisms of human consciousness, the more accurately we can stratify different groups.

Since the consciousness of different groups is interrelated, the validity of any consciousness test must rely on other forms of verification. Simultaneously, developing independent and effective consciousness tests is crucial, enabling consciousness tests to not only corroborate each other but also correct one another.

The iterative natural kind strategy presents a virtuous cycle between consciousness tests and consciousness theories, offering a hierarchical and expandable structure for understanding consciousness. This provides a reliable path for deepening our comprehension of consciousness.

 

6. Conclusion

This paper delineates a four-dimensional space for developing robust and generalizable consciousness tests by capturing the key characteristics of consciousness testing. This framework not only enables us to evaluate the strengths and weaknesses of individual tests and better comprehend how they interrelate but also provides a solid foundation for developing more refined assessments. An ideal consciousness test possesses universal applicability, perfect specificity and sensitivity, and high rational confidence, thereby establishing it as an effective measure of consciousness.

Furthermore, the paper introduces three strategies for validating the effectiveness of these tests: the redeployment strategy, the theory-based strategy, and the iterative natural kind strategy. Among these, the iterative natural kind strategy is considered the ideal choice. This strategy posits that consciousness is a natural kind, and we should begin with tests for groups closely related to us and then systematically and purposefully expand to other groups.

The four-dimensional space and the validation strategies serve as a bridge between consciousness tests and consciousness theories, as well as a nexus connecting third-person experimental results with the subjective characteristics of conscious experience. All of this ultimately aids in addressing critical questions in the fields of consciousness science and philosophy of mind: What is consciousness? Is a unified theory of consciousness possible? How should we understand the relationship between consciousness and folk psychology?

]]> https://admin.next-question.com/features/unify-the-measurement/feed/ 0 How Does Food Manipulate the Brain? https://admin.next-question.com/features/food-manipulate-brain/ https://admin.next-question.com/features/food-manipulate-brain/#respond Sun, 22 Dec 2024 19:29:01 +0000 https://admin.next-question.com/?p=2607

On September 6, 2024, the delightful aromas of food wafted from the Sinan Mansion in Shanghai, like an invisible web quietly spreading and attracting numerous visitors to embark on a wonderful experience of “Brain and Food.”

According to Professor Mao Ying, the initiator of the event, Dean of Huashan Hospital Affiliated to Fudan University and Director of the Clinical Translational Research Center of Tianqiao and Chrissy Chen Institute, this interdisciplinary science communication event is the first session of the “Brain Exploration” series, co-hosted by the Tianqiao and Chrissy Chen Institute (TCCI) and the Chinese Neuroscience Society. Several neuroscientists were invited to join renowned food critics in engaging the public through academic presentations, scientific experiments, and storytelling, sharing knowledge about how food influences cognition, emotions, and behavior.
 

1. Four Olfactory Truths Beyond Cognition

The book The Science of Tasting mentions five basic taste types: sour, sweet, bitter, salty, and umami. However, our rich food experiences result from the combined effects of multiple senses, including smell, taste, and touch, with olfaction playing a particularly crucial role. As Researcher Zhou Wen from the Institute of Psychology at the Chinese Academy of Sciences stated during the forum:

“Most perceptions of food flavors come from olfaction rather than taste.”

In fact, there are two main pathways for perceiving odors—the anterior nasal pathway and the posterior nasal pathway. The anterior pathway refers to odor molecules emitted by food entering through the nostrils and reaching the olfactory epithelium. Various olfactory receptors distributed on the olfactory epithelium convert chemical signals into electrical signals, which are then sent to the olfactory bulb, allowing us to smell the food’s aroma. The posterior pathway occurs during chewing; as food is heated in the mouth and mixed with saliva, its odor molecules enter the nasal cavity during exhalation, activating olfactory receptors in the same area and transmitting information to the olfactory bulb.


▷ Anterior and posterior nasal pathways for odor perception

“Odors do not exist as physical entities; they are encoded by neural networks in the brain.”

Odors are not inherently contained within the chemical molecules themselves; rather, they are processed by olfactory neurons in the brain to form a dynamic spatiotemporal pattern. Different odor molecules activate different glomeruli at various locations in the olfactory bulb, and the order of activation relates to the components and dynamic characteristics of the odor. Therefore, individual perceptions of odors include a high sensitivity to temporal information and responses to continuous changes.

In this way, our olfactory system acts like a “chemical analyst,” accurately analyzing the chemical composition of the environment and its dynamic changes. Olfactory information begins in the olfactory bulb and spreads via the olfactory tract to multiple key brain regions in the limbic system, such as the piriform cortex, amygdala (emotion center), and entorhinal cortex (brain GPS), resulting in the sensation of smell.


▷ Spatiotemporal Coding of Olfactory Perception

“Humans are also adept at using odors for localization, with abilities rivaling those of dogs, provided they get low enough.”

Since olfactory perception primarily involves ipsilateral conduction, the brain can accurately identify odors sampled from different spatial areas by the left and right nostrils. With two nostrils, humans theoretically possess the ability to perceive direction through smell to facilitate localization. A study on olfactory localization involving humans and dogs showed that [1], with appropriate training, blindfolded humans can effectively navigate using smell and continuously adjust their route, ultimately tracking down a target—chocolate.


▷ Olfactory Localization Experiment

Zhou Wen explained that her laboratory conducted a study on determining direction based on optic flow [2], which also found that when visual cues are insufficient to distinguish direction, our judgment of the direction of movement is influenced by the concentration of odors in the nostrils. The brain tends to perceive movement toward the side with the higher concentration. This implies that the brain can synthesize analysis based on odor concentration and provide directional clues. It’s no wonder people often find themselves wandering into the bakery section while shopping.

“Compared to other senses, olfaction more readily evokes memories and emotions.”

Odors can elicit rich emotional experiences and vivid memory fragments, often imbued with strong spatiotemporal information; specific smells can trigger unique memories of particular people or places. For example, the scent of braised pork may remind one of their mother and scenes from childhood when she was busy in the kitchen; a whiff of flowers may evoke memories of learning to ride a bike with a grandmother…

These spatiotemporal memories carried by smells benefit from the hippocampus’s [3] important role in memory formation, particularly in autobiographical recollection (personal past experiences). The transmission of olfactory signals and memory often impacts structures such as the amygdala and hippocampus, further linking emotions, memories, and spatiotemporal perception.


▷ Functional Connectivity of the Hippocampus and Olfactory System

Thus, losing the sense of smell means not only losing a part of sensory experience but also severing a rich emotional world associated with olfactory memories. If prolonged loss of smell occurs due to viral infections or neurodegenerative diseases, it may even affect an individual’s emotional state. To restore olfactory function, it is crucial to reestablish connections between smells and existing memories in the brain. For instance, when seeing soy sauce, one should strive to recall its scent, thereby strengthening neural connections and aiding olfactory recovery.

Olfaction is vital for most people, influencing not only daily life quality but also playing a key role in emotional connections and social interactions.
 

2. The “Manipulated” Eating Behavior

Olfaction helps humans locate food, but actual eating behavior is influenced by multiple internal and external factors, such as physiological state, the allure of food, and individual anxiety levels.

“CRHLHA neurons play a crucial role during eating.”

Researcher Xu Huatai from the Lingang Laboratory introduced a type of neuron in the lateral hypothalamus of mammals called corticotropin-releasing hormone neurons (CRHLHA neurons). These neurons become active during eating, and their firing activity can be observed and recorded in the laboratory. Seeing or smelling food can also stimulate the activation of CRHLHA neurons; during hunger, the activity level of CRHLHA neurons is higher than when satiated, reflecting dynamic changes in appetite.[4]


▷ Activity Levels of CRHLHA Neurons Reflect Appetite Changes

“Even when full, the deliciousness of food can entice further eating.”

Generally, after eating, our attraction to food decreases significantly. However, as the saying goes, “the stomach for meals and the stomach for dessert are never the same.” Even when satiated, CRHLHA neurons can become active again when faced with tempting chocolates or desserts, as if saying, “Just a little more.” Even the brain cannot resist the “pure pleasure” that delicious food brings.


▷ CRHLHA Neurons Activate in Response to Preferred Food Signals When Satiated

“Eating and anxiety behaviors share a common neural circuitry.”

In nature, animals tend to reduce their eating frequency, activity level, and exploratory behavior under stressful conditions, demonstrating what is referred to as “eating anxiety.” In experimental studies, researcher Xu Huatai manipulated the activity of CRHLHA neurons in mice, achieving artificial control over their eating behavior and altering their perception of safety and anxiety levels.


▷ Eating and Safety Perception

By using optogenetic technology, researchers can increase or decrease the activity of CRHLHA neurons, leading to corresponding eating or refusal behaviors in the manipulated mice. In the experiments, artificially lowering the activity of CRHLHA neurons significantly reduced the eating behavior of hungry mice, even in the presence of enticing chocolates, which led to a decreased desire to eat. Conversely, artificially activating CRHLHA neurons caused the mice to “override their instincts,” increasing eating behavior and displaying risk-taking exploratory behavior.

Xu Huatai believes, “Current research results reveal the shared neural basis of eating and anxiety behaviors. However, changes in eating behavior under anxiety in humans are much more complex than in mice. On one hand, people rarely need to consider personal safety when eating in social settings; on the other hand, individuals react differently to anxiety, which can lead to extremes such as binge eating or anorexia.”
 

3. “Out of Control” Eating Motivation

Binge eating and anorexia are two manifestations of “out-of-control” eating motivation. Professor Zhou Yudong, Vice Dean of the School of Brain Science and Brain Medicine at Zhejiang University, explains that “eating behavior is primarily regulated by the reward center in the hypothalamus of the brain.”

“As a natural reward, food—similar to money, games, drugs, and maternal love—can enhance eating behavior through reinforcement.”

What constitutes “delicious food”? While foodies might offer poetic answers, for the brain, the answer may be quite straightforward: “high-energy” foods. Altering the composition of fats, proteins, or carbohydrates in food can effectively influence appetite. High-fat foods, in particular, are more likely to stimulate appetite compared to proteins and carbohydrates. Even in high-stress environments, high-fat foods can persistently drive foraging behavior, a phenomenon known as “compulsive eating” or “uncontrolled eating.”


▷ Reward-driven Uncontrolled Eating Induced by Food

What changes occur in the brain that control eating motivation behind compulsive eating behaviors? Professor Zhou’s team discovered that,

“The anteroventral periventricular nucleus (aPVT) plays a key role in compulsive eating behaviors and approach-avoidance conflicts.”

In animal experiments, mice subjected to high-fat diets exhibited behaviors reminiscent of compulsive eating, such as overcoming fears of open spaces and electric shocks to forage boldly. Notably, activating aPVT neurons in normally fed mice resulted in behaviors similar to those of high-fat diet mice.[5]


▷ aPVT: A New Food Reward Center

“The underlying mechanism of compulsive eating involves metabolic inflammation in the brain caused by excessive nutrient intake.”

Further research revealed that lipid molecules from high-fat food intake trigger inflammation in the aPVT region, promoting increased appetite and driving compulsive eating behaviors. This finding highlights the intricate connections between the nervous system, endocrine system, and immune system. Excessive energy intake leads to energy stress, resulting in central and peripheral metabolic inflammation. This may affect immune and metabolic functions, increasing the risk of metabolic diseases, neurodegenerative disorders (such as Alzheimer’s and Parkinson’s), cardiovascular diseases, and cancers.

Therefore, from a neuroscientific perspective, Professor Zhou reminds us, “There is a close relationship between food choices and disease, emphasizing the need to maintain a balanced diet for both physical and brain health.”


▷ High-fat diet induces inflammation in the aPVT brain region and promotes compulsive eating behavior

From the wondrous role of olfaction to the complex regulation of eating behavior and the phenomenon of “out-of-control” eating motivation, scientists at the “Brain and Food Forum” used vivid experiments and rigorous data to reveal the intricate neuroscience behind delicious food to both online and offline audiences. Looking ahead, as research in brain science continues to deepen, we anticipate uncovering more secrets of the interaction between food and the brain. This will enhance our understanding of human eating behavior and provide new ideas and methods for preventing and treating diet-related diseases.


▷ Scan the QR code above to view the full replay of the forum.

Support for science communication has always been a vital mission of the Tianqiao and Chrissy Chen Institute. In the future, the Tianqiao and Chrissy Chen Institute will continue the “Brain and Food Forum” format in collaboration with the Chinese Neuroscience Society, launching other diverse interdisciplinary exchanges such as “Brain and Sports” and “Brain and Visual Arts.” We hope to gather wisdom and resources from different fields to address complex issues in neuroscience research and deepen public understanding of the brain. We look forward to your participation!

]]> https://admin.next-question.com/features/food-manipulate-brain/feed/ 0 Why Do New Technologies Periodically Appear, Disappear, and Reappear? A New Perspective Based on Free Energy https://admin.next-question.com/science-news/free-energy/ https://admin.next-question.com/science-news/free-energy/#respond Wed, 27 Nov 2024 07:57:26 +0000 https://admin.next-question.com/?p=2587

Abstract

This paper proposes a concise explanation based on the Free Energy Principle (FEP) to account for the behavioral evolution of early Homo erectus from a theoretical biological perspective, particularly focusing on their production of stone axes. “Cognitive surprise” may have prompted early Homo erectus to occasionally exhibit non-traditional or anomalous behaviors. The co-evolutionary dynamics of these behaviors reveal patterns of emergence, disappearance, and re-emergence among Stone Age humans, akin to the game of “Snakes and Ladders.”

When these artifacts appear in the records of the Early and Middle Pleistocene, anthropologists and archaeologists usually interpret them as evidence of early humans climbing the imagined genealogical “ladder.” This interpretation is used to explain how human cognitive abilities gradually developed, leading to increasingly innovative skills and ultimately resulting in the cognitive superiority of Homo sapiens.

However, Héctor Marín Manrique, Karl Friston, and Michael Walker propose a different hypothesis: the behaviors of anomalous individuals are not always accepted by the group. The group may be unable to understand or imagine the potential advantages of these novel behaviors, or even incapable of expressing these differences. This failure in understanding, combined with sporadic demographic events, may lead to these anomalous behaviors being ignored and thus not transmitted to future generations. This situation is akin to sliding down a “snake” in the game of “Snakes and Ladders,” potentially creating discontinuities in the course of human behavioral evolution and resulting in evolutionary mysteries that are difficult to unravel.


▷ Manrique, Héctor Marín, Karl John Friston, and Michael John Walker. “‘Snakes and ladders’ in paleoanthropology: From cognitive surprise to skillfulness a million years ago.” Physics of life reviews (2024).
 

1. “Handaxes”: An Archaeological Case Study

Stone handaxes are among the evidence of human activity during the Early and Middle Pleistocene epochs [1-2]. They are classified as bifacial large cutting tools (BFLCTs), a category that also includes cleavers.


▷ An early handaxe and a cleaver from Western Europe (scale in centimeters).

The longitudinal and transverse symmetry of these handaxes is a common and remarkable feature, which existed as early as approximately 1.7 to 1.6 million years ago [3]. It is widely believed that this symmetry reflects the maker’s intentionality—that is, the shape of the tool was preconceived before its creation [4].

This preconception reflects a neurobiological cognitive ability of the makers: they were able to sculpt a desired three-dimensional shape, such as a handaxe, from a raw stone core [5-9]. Paleolithic archaeologists generally agree that morphological and technical regularities can be perceived in the shapes of these tools. This marks an essential difference between handaxes and other simple stone tools lacking such features, regardless of whether these simple tools originate from handaxe sites.

Although BFLCTs represent complex stone cutting tools made by removing large flakes from stone cores, they are not the earliest cutting tools. The oldest flaked stone tools appeared in Africa’s Late Pliocene around 3.4 million years ago, possibly created by Australopithecines. The oldest fossil bones attributed to humans can be traced back to 2.8 million years ago. Around the beginning of the Early Pleistocene, about 2.58 million years ago, our ancient ancestors began to show signs of cooperation. One of these signs is the tools they made by striking stones together, creating shell-like shapes well-suited for handheld gripping.

Compared to BFLCTs, these earlier tools were simpler to manufacture. More refined handaxes first appeared in East Africa around 1.76 million years ago, by which time Homo erectus had replaced several earlier hominins [10-14]. In South Africa, handaxes appeared around 1.6 million years ago [15]. Subsequently, handaxes sporadically appeared in the Paleolithic records of Africa and Eurasia, but their distribution in time and space was uneven. From the later Early Pleistocene onwards, between 1.5 million and 1 million years ago, BFLCTs appeared at a few sites in West Asia and South Asia. In other words, BFLCTs were continuously manufactured by various forms of humans over a period of more than 1.5 million years.

The simple stone tools mentioned earlier, such as choppers that appeared as early as 3 million years ago, continued to be found archaeologically until recent millennia. In other words, even after people began making more refined bifacial handaxes around 1.76 million years ago, the earlier, simpler techniques remained widespread, in sharp contrast to the intermittent manufacture of handaxes. The latter’s record was intermittent and sparse over the more than one million years following their first appearance, indicating it was not a common manufacturing technique. Their scarcity also suggests that hundreds of thousands of years may have passed between the invention of these bifacial stone tools and their important role in Paleolithic life.


▷ Chopper

This raises a series of questions. Since the manufacture of handaxes reflects clear intentionality, how did this understanding arise among early humans? How many times might it have arisen? Did it appear and disappear multiple times in different times and places? [16] Can we assume it arose only once and was skillfully passed down through generations and transmitted between communities, having a profound impact across time and space?

Some Paleolithic archaeologists and paleoanthropologists consider this possibility credible. It is based on a “progressivist” assumption that BFLCTs provided functional advantages to early humans. These advantages had obvious adaptive value, aiding survival and promoting gradual population growth and reproductive success. These skills facilitated population growth and geographic expansion by enabling early humans to extensively exploit resources across different ecological zones and biomes.

But this view has been challenged from many quarters. First, the temporal and spatial span of handaxes is enormous. Even if several archaeological assemblages containing handaxes are found within a geographical area of ≥500 kilometers and can be dated to roughly similar periods (possibly ≥200,000 years), it is difficult to assert that this necessarily represents a continuous, intergenerational “cultural tradition.” If an average generation lasts 25 years, then 200,000 years represents 8,000 generations. This poses a significant challenge to the possibility (let alone the plausibility) of explaining the archaeological record using “social transmission,” “cumulative culture,” and “cultural history” approaches.

Secondly, the temporal and spatial distribution of handaxes is irregular, exhibiting multiple instances of “appearance, disappearance, and reappearance” over time, and their spatial distribution is also relatively scattered. If we consider the manufacture of handaxes as a result of cultural transmission that brought survival advantages, it is hard to explain this sparsity and intermittence. Some suggest that the preservation of handaxes’ “social transmission” or “cultural transmission” in the archaeological record is incomplete, but this is merely a remedial explanation. Finally, if handaxes coexisted with simpler choppers, and the latter did not show significant disadvantages, why were choppers continuously produced?

This paper provides a concise explanation of this phenomenon based on the Free Energy Principle (FEP) [17]. It can explain not only stone tools like handaxes but also other technological developments of ancient humans. To this end, we first introduce a helpful analogy—the game of “Snakes and Ladders.”
 

2. The Analogy of the “Snakes and Ladders” Game

The ancient game of “Snakes and Ladders” is a board game suitable for two or more players. The game board is shown below. The ladders and snakes on the board connect different squares. At the start of the game, players place their tokens on the starting point (square numbered 1) and roll the dice to determine the number of steps to move forward; if a player rolls a six, they get an extra turn.

If a token lands on the bottom of a ladder, it can ascend to the corresponding top of the ladder; if it lands on the head of a snake, it must slide down along the snake’s body to its tail. The first player to reach the finish wins. Obviously, during the game, players’ tokens undergo multiple ascents and descents, sometimes even returning to earlier positions.


▷ Snakes and Ladders: https://en.wikipedia.org/wiki/Snakes_and_ladders

From the perspective of cumulative culture, the development of human technology and culture is like climbing a ladder, continuously ascending. But in reality, the “ascent of humanity” over two million years is much like this game—it has not been smooth sailing, nor has it progressed in a straight line. In an exaggerated sense, it can almost be seen as a fable describing how, in the process of technological and cultural development, we move at a snail’s pace—slowly, hesitantly, stumbling, and sometimes even regressing. The irregular temporal and spatial appearances of technologies like the handaxe can be understood in this way.

The following argument will explain why this developmental pattern is intermittent. This argument is rooted in the theory of self-organizing systems in physics. Simply put, self-organizing systems far from equilibrium inevitably exhibit a kind of wandering behavior, which in many ways is similar to the characteristics of the “Snakes and Ladders” game [17].
 

3. The Free Energy Principle, Active Inference, and Human Evolution

The Free Energy Principle (FEP) offers a fundamental approach based on statistical physics to understand how self-organizing systems (such as organisms) adapt through evolution and how sentient beings produce behavioral responses. In this context, we focus on animals possessing a Hierarchically Mechanistic Mind (HMM) [18-19].

HMM defines the embodied, situated brain as a complex adaptive system that generates perception-action cycles through dynamic interactions among hierarchically organized neurocognitive mechanisms, actively minimizing the entropy (i.e., dispersion or decay) of sensations and bodily states [19]. HMM can be viewed as a neurobiological inference machine [20], operating in accordance with the Free Energy Principle.

The Free Energy Principle states that all organisms aim to minimize free energy to maintain their survival. However, self-organizing systems that can adapt to their environment will, through active inference, respond to changing environments beyond mere perceptual predictions, consuming more energy. The quantity they attempt to minimize is defined as free energy, which measures the difference between external perceptual predictions and internal beliefs (preferences, priors). Simply put, perceptual predictions are “objective inferences” based on externally received information—for example, when walking and encountering rain, we predict we will get wet based on visual observation and the tactile sensation of raindrops—while internal belief preferences are the subjective desire not to get wet. The former is unrelated to whether we dislike getting wet.

Biological systems need to minimize the difference between sensory predictions and internal beliefs as much as possible. To achieve this, besides adjusting beliefs based on external information (e.g., accepting the fact of getting wet in the above scenario), they can also change the state of the external world through actions, such as running to take shelter from the rain. Under the framework of active inference based on the Free Energy Principle, an organism’s cognition and behavior follow the same rules, serving to minimize perceptual surprise—which is a form of prediction error. In information theory and Bayesian conditional probability analysis of predictive coding, this prediction error is often called surprisal or self-information, calculated as the negative logarithm of the probability of possible events.


▷ In the Free Energy Principle, the system’s states can be divided into four categories: the external state representing the environment, the sensation state of the agent’s observations, the internal state, and the action state.

For organisms in an ecosystem, negative free energy is regarded as a manifestation of their fitness, requiring them to optimize thermodynamic efficiency in their interactions with ecological niches. Free energy, as an informational measure, quantifies the average surprise of various outcomes for an organism. Therefore, when an organism can accurately model and predict its interactions with its ecological niche, we can say it has adapted to that environment because it avoids surprising exchanges with the environment (e.g., deviating from homeostatic set points) or avoids being in extremely low-probability states (e.g., injury or death).

Certain organisms, such as humans, have developed deep generative models and possess the ability to predict the outcomes of their actions. Such organisms can envisage counterfactual futures under different actions; in simple terms, they can plan [21-25]. Because these organisms incorporate their impending actions into their perceptual predictions, they can forecast potential outcomes and make decisions to perform active inference (planning).

This is a key aspect of active inference: selecting actions and plans based on minimizing expected free energy. Simply put, the choice of action aims to minimize expected surprise, to avoid low-probability adverse events (such as injury or death), and to reduce environmental uncertainty through the outcomes produced. Reducing uncertainty is crucial because it means that perceptual behavior under the deep generative model has an epistemic aspect, keeping perceptual behavior sensitive to salience and novelty.

Only systems with such deep generative models will exhibit this exploratory behavior because they are the only ones capable of responding to cognitive revelations and answering the question, “What will happen if I do this?” [26-28]. Therefore, the expected free energy driving plan or policy selection can be decomposed into pragmatic and epistemic components, supporting exploitative and exploratory behaviors, respectively.

To put it another way, if an organism anticipates the consequences of future behaviors, its mind is filled with possibilities and surprises before the action unfolds. This expectation of future results prompts the individual to try to reduce perceptual surprise, thereby enhancing control over the environment and gaining a deeper understanding of the possible consequences of various behaviors. This is the driving force behind exploratory behavior.

In the broader context of biological evolution, minimizing free energy supports not only the survival of organisms but also their ability to successfully reproduce offspring [29-30]. Natural selection is gradual and conservative; in the face of environmental interdependencies and regularities, the integrity of organisms’ generative self-organizing systems is supported by adaptive interactions, including active inference. This active inference is closely related to prior expectations induced by generative models and endowed by evolution [31-33]. Active inference instantiates a generative model whose components are neural networks in the brain capable of predicting the most probable upcoming sensations. In the Bayesian statistics of conditional probability, evolution and natural selection can be regarded as natural Bayesian model selection (also known as structure learning [34]).

Therefore, evolution proceeds gradually and intermittently at biological, technological, and even psychosocial levels [35-37]. The Free Energy Principle is related to biology, especially neurobiology, whether at the ontogenetic level of cellular dynamics, neural circuits, and behavior, or at the phylogenetic level of populations evolving through natural selection of biological adaptation and fitness.
 

4. “Cognitive Surprise” and the “Snakes and Ladders” Diagram

Now, let’s apply the earlier perspectives to interpret the archaeological findings related to handaxes. In discussing this topic, we need not overemphasize the role of “group size” in driving the social transmission of handaxe-making technology. Instead, we should focus more on the individual level. That is, what sparked the cognitive awareness in the makers regarding the possibility of crafting stone handaxes? And how did observers (other companions or the makers themselves) react to this innovative approach?

In an environment where traditional tools were mainstream, an individual who spends more time and effort to produce refined and more complex handaxes requires an exploratory inclination. This includes being aware of what results might arise from adopting different manufacturing methods (as seen in the first part of this paper, the form of the handaxe reflects the maker’s preconceived notions of shape). It also involves recognizing that the products of new technology might bring long-term practical value, namely the potential to reduce subsequent energy consumption. All these require the individual to possess deeper generative models.

Thus, we might consider that throughout the temporal and spatial span of handaxe existence, the overall cognitive abilities of various ancient human groups did not reach a level at which they could recognize the innovativeness and long-term value of handaxes [38-39]. However, occasionally, individual members might possess more expressive (i.e., deeper) generative models, discerning both the exploratory (cognitive) and exploitative (practical) value, thereby developing these more complex crafts under chance circumstances. Moreover, other individuals in the group (sometimes including these anomalous individuals themselves) often failed to realize the uniqueness of such innovative behaviors by their companions and did not experience “cognitive surprise” [40].

For bystanders, such behavior did not align with the typical activities in their behavioral repertoire, which were based on their practical prior knowledge. Their strong adherence to prior normative beliefs often overwhelmed the exploration of new cognition. This led to these new creations not being accepted by the group; the creators’ “skills” were seen as a waste of the group’s invaluable and indispensable energy and time needed for survival.

It is worth noting that, in maintaining basic homeostasis, our brain consumes 11.2 watts of energy per kilogram at rest, whereas the entire body consumes only 1.25 watts per kilogram. The brain uses 50% of our glucose intake and 20% of our oxygen intake, with 75-80% of the energy supporting neuronal activity in the brain [41].

Therefore, the activities of Homo erectus one million years ago had to meet greater daily energy demands, which was essential for their survival and, ultimately, for our existence today. In such circumstances, tried-and-tested routines became dominant in life. Those who could efficiently perform these tasks earned the group’s trust, while eccentric, unorthodox, or heterogeneous behaviors were ignored, leaving no traces in collective memory or lore. Moreover, when our ancient ancestors made stone tools by striking large stone blocks, they occasionally knocked off some peculiar shapes in the process. These shapes were actually produced unintentionally, and people at the time might not have realized that these accidentally formed stones could also be used as tools [42-43], even though archaeologists classify them as tools from a comparative morphological perspective.


▷ Vadim Sherbakov

In general, we believe that during the Early and Middle Pleistocene, the fate of those anomalous, heterogeneous behaviors among humans was one of ups and downs. The personal achievements seemingly reached on the technological “ladder” were likely ignored by their companions, who could not imagine or express the survival advantages these behaviors might bring. Furthermore, humans at that time lacked sufficiently fluent communication abilities, and, coupled with various demographic accidents, records of heterogeneous behaviors were often lost. Even if these technologies were remembered in some small hunter-gatherer groups, they might disappear upon extinction due to population fluctuations.

Human communication during that period was very limited; even if language existed, it was quite primitive. Additionally, humans at that time had shorter lifespans, reached biological maturity earlier than now, and had limited brain capacity (these smaller brains were only two-thirds the size of our brains today). In any case, this limited the neurobiological plasticity in small adult brains, thereby affecting deeper cognitive abilities*.

The fourth section of the original paper focuses on the detailed brain differences between Homo erectus and modern humans. Since it is not directly relevant to the overall logic of this article and serves only as supporting evidence, we will not elaborate here.

Moreover, groups faced various risks of extinction, such as unequal sex ratios, death during childbirth or congenital disabilities, infections, tooth loss, or scarcity of food or water caused by plagues, blights, droughts, floods, wildfires, frosts, blizzards, or other violent climatic events. These factors combined to make it difficult for the results of anomalous behaviors to be stably transmitted within populations.

The final question is, if these handaxes distributed across various locations and times were independently made by different individuals, why do they have certain morphological similarities? For example, transverse and longitudinal symmetry.

In this regard, we might understand that the nature of self-organization has inherent periodicity, randomness, and a tendency to minimize free energy. This causes the system to repeatedly explore certain specific states within a vast state space, leading to deep generative models exhibiting certain similarities in preferences (such as an awareness of symmetry; evidently, symmetry is cognitively more concise and conforms to the principle of minimization). At the same time, limitations of raw materials and physiological structures (such as ease of grip) also constrained the forms of manufacture. This allows tools like handaxes unearthed from different places and made at different times to have certain morphological similarities under independent manufacture. Apart from coincidence, this similarity also has certain evolutionary inevitability and is not necessarily the result of cultural transmission.

In summary, when examining the evolution of handaxe technology, we find irregularity and periodicity in technological development. This phenomenon is not a unique historical script of humans but a self-organizing behavior commonly existing in biological systems. The Free Energy Principle provides us with a powerful tool to understand this phenomenon.

Each technological “leap” or “slide” may be an adaptive response to changes in the external environment, including physical environmental changes and shifts in social, economic, and cultural contexts. In the contemporary society of globalization and rapid information flow, human technology and culture may exhibit even more complex dynamic changes.

For this reason, we should promote open innovation systems, allowing the existence and development of “non-traditional” thinking. Through such openness and collaboration, we can minimize the “surprise” elements in technological advancement and foster a more stable and sustainable environment for technological and cultural development.

]]>
https://admin.next-question.com/science-news/free-energy/feed/ 0
Why Haven’t Large Language Models “Killed” Psychology? https://admin.next-question.com/science-news/llm-kills-psychology/ https://admin.next-question.com/science-news/llm-kills-psychology/#respond Sun, 17 Nov 2024 06:00:54 +0000 https://admin.next-question.com/?p=2571

Since the end of 2022, ChatGPT has swept across the globe like a tidal wave, and people are eagerly anticipating its potential applications. Business professionals, scholars, and even ordinary individuals are pondering the same question: How will AI shape the future of our work?

As time goes by, many concepts are gradually becoming reality. Humanity seems to have grown accustomed to AI assisting us or even replacing us in many work scenarios. Early fears about GPT have gradually dissipated; instead, people have become overly reliant on GPT, even overlooking possible limitations and risks. We refer to this excessive dependence on GPT while ignoring its risks as “GPTology.”

The development of psychology has always closely followed technological innovation. Sociologists and behavioral scientists have consistently leveraged technology to collect rich and diverse data. Technologies ranging from neuroimaging and online survey platforms to eye-tracking devices have all contributed to critical breakthroughs in psychology. The digital revolution and the rise of big data have fostered new disciplines like computational social science. Just as in other fields (medicine [1], politics [2]), large language models (LLMs) that can understand, generate, and translate human language with astonishing subtlety and complexity have also had a profound impact on psychology.

In psychology, there are two main applications for large language models: On one hand, studying the mechanisms of LLMs themselves may provide new insights into human cognition. On the other hand, their capabilities in text analysis and generation make them powerful tools for analyzing textual data. For example, they can transform textual data such as individuals’ written or spoken expressions into analyzable data forms, thereby assisting mental health professionals in assessing and understanding an individual’s psychological state. Recently, numerous studies have emerged using large language models to advance psychological research. Applications of ChatGPT in social and behavioral sciences, such as hate speech classification and sentiment analysis, have shown promising initial results and have broad development prospects.

However, should we allow the current momentum of “GPTology” to run rampant in the research field? In fact, the integration process of all technological innovations is always full of turbulence. Allowing unchecked application of a certain technology and becoming overly reliant on it may lead to unexpected consequences. Looking back at the history of psychology, when functional magnetic resonance imaging (fMRI) technology first emerged, some researchers abused it, leading to absurd yet statistically significant neural association phenomena—for instance, researchers performed an fMRI scan on a dead Atlantic salmon and found that the fish displayed significant brain activity during the experiment. Other studies have indicated that due to statistical misuse, the likelihood of finding false correlations in fMRI research is extremely high. These studies have entered psychology textbooks, warning all psychology students and researchers to remain vigilant when facing new technologies.

 

▷ Abdurahman, Suhaib, et al. “Perils and opportunities in using large language models in psychological research.” PNAS Nexus 3.7 (2024): pgad245.

We can say that we have entered a “cooling-off period” in our relationship with large language models. Besides considering what large language models can do, we need to reflect more on whether and why we should use them. A recent review paper in PNAS Nexus explores the application of large language models in psychological research and the new opportunities they bring to the study of human behavior.

The article acknowledges the potential utility of LLMs in enhancing psychology but also emphasizes caution against their uncritical application. Currently, these models may cause statistically significant but meaningless or ambiguous correlations in psychological research, which researchers must avoid. The authors remind us that, in the face of similar challenges encountered in recent decades (such as the replication crisis), researchers should be cautious in applying LLMs. The paper also proposes directions on how to use these models more critically and prudently in the future to advance psychological research.
 

1. Can Large Language Models Replace Human Subjects?

When it comes to large language models (LLMs), people’s most intuitive impression is of their highly “human-like” output capabilities. Webb et al. examined ChatGPT’s analogical reasoning abilities [3] and found that it has already exhibited zero-shot reasoning capabilities, able to solve a wide range of analogical reasoning problems without explicit training. Some believe that if LLMs like ChatGPT can indeed produce human-like responses to common psychological measurements—such as judgments of actions, value endorsements, and views on social issues—they may potentially replace human subject groups in the future.

Addressing this question, Dillon and colleagues conducted a dedicated study [4]. They first compared the moral judgments of humans and the language model GPT-3.5, affirming that language models can replicate some human judgments. However, they also highlighted challenges in interpreting language model outputs. Fundamentally, the “thinking” of LLMs is built upon human natural expressions, but the actual population they represent is limited, and there is a risk of oversimplifying the complex thoughts and behaviors of humans. This serves as a warning because the tendency to anthropomorphize AI systems may mislead us into expecting these systems—operating on fundamentally different principles—to exhibit human-like performance.

Current research indicates that using LLMs to simulate human subjects presents at least three major problems.

First, cross-cultural differences in cognitive processes are an extremely important aspect of psychological research, but much evidence shows that current popular LLMs cannot simulate such differences. Models like GPT are mainly trained on text data from WEIRD (Western, Educated, Industrialized, Rich, Democratic) populations. This English-centric data processing continues the English centralism in psychology, running counter to expectations of linguistic diversity. As a result, language models find it difficult to accurately reflect the diversity of the general population. For example, ChatGPT exhibits gender biases favoring male perspectives and narratives, cultural biases favoring American viewpoints or majority populations, and political biases favoring liberalism, environmentalism, and left-libertarian views. These biases also extend to personality traits, morality, and stereotypes.

Overall, because the model outputs strongly reflect the psychology of WEIRD populations, high correlations between AI and human responses cannot be reproduced when human samples are less WEIRD. In psychological research, over-reliance on WEIRD subjects (such as North American college students) once sparked discussions. Replacing human participants with LLM outputs would be a regression, making psychological research narrower and less universal.

▷ Comparing ChatGPT’s responses to the “Big Five” personality traits with human responses grouped by political views. Note: The figure shows the distribution of responses from humans and ChatGPT on the Big Five personality dimensions and different demographic data. ChatGPT gives significantly higher responses in agreeableness and conscientiousness and significantly lower responses in openness and neuroticism. Importantly, compared to all demographic groups, ChatGPT shows significantly smaller variance across all personality dimensions.

Second, LLMs seem to have a preference for “correct answers.” They exhibit low variability when answering psychological survey questions—even when the topics involved (such as moral judgments) do not have actual correct answers—while human responses to these questions are often diverse. When we ask LLMs to answer the same question multiple times and measure the variance in their answers, we find that language models cannot produce the significant ideological differences that humans do. This is inseparable from the principles behind generative language models; they generate output sequences by calculating the probability distribution of the next possible word in an autoregressive manner. Conceptually, repeatedly questioning an LLM is similar to repeatedly asking the same participant, rather than querying different participants.

However, psychologists are usually interested in studying differences between different participants. This warns us that when attempting to use LLMs to simulate human subjects, we cannot simply use language models to simulate group averages or an individual’s responses across different tasks. Appropriate methods should be developed to truly reproduce the complexity of human samples. Additionally, the data used to train LLMs may already contain many items and tasks used in psychological experiments, causing the model to rely on memory rather than reasoning when tested, further exacerbating the above issues. To obtain an unbiased evaluation of LLMs’ human-like behavior, researchers need to ensure that their tasks are not part of the model’s training data or adjust the model to avoid affecting experimental results, such as through methods like “unlearning.”

Finally, it is questionable whether GPT has truly formed a moral system similar to that of humans. By querying the LLM and constructing its internal nomological network—observing the correlations between different moral domains—it was found that these metrics differ significantly from results obtained from humans.

▷ ChatGPT and Human Moral Judgments. Note: a) Distributions of human moral judgments (light blue) and GPT (light red) across six moral domains. Dashed lines represent means. b) Interrelationships between human moral values (N=3,902) and ChatGPT responses (N=1,000). c) Partial correlation networks among moral values based on different human samples from 19 countries (30) and 1,000 GPT responses. Blue edges indicate positive partial correlations; red edges indicate negative partial correlations.

In summary, LLMs ignore population diversity, cannot exhibit significant variance, and cannot replicate nomological networks—these shortcomings indicate that LLMs should not replace studies on Homo sapiens. However, this does not mean that psychological research should completely abandon the use of LLMs. On the one hand, applying psychological measurements traditionally used for humans to AI is indeed interesting, but interpretations of the results should be more cautious. On the other hand, when using LLMs as proxy models to simulate human behavior, their intermediate layer parameters can provide potential angles for exploring human cognitive behavior. However, this process should be conducted under strictly defined environments, agents, interactions, and outcomes.

Due to the “black box” nature of LLMs and the aforementioned situation where their outputs often differ from real human behavior, this expectation is still difficult to realize. But we can hope that in the future, more robust programs can be developed, making it more feasible for LLMs to simulate human behavior in psychological research.
 

2. Are Large Language Models a Panacea for Text Analysis?

Apart from their human-like qualities, the most significant feature of large language models (LLMs) is their powerful language processing capability. Applying natural language processing (NLP) methods to psychological research is not new. To understand why the application of LLMs has sparked considerable controversy today, we need to examine how their use differs from traditional NLP methods.

NLP methods utilizing pre-trained language models can be divided into two categories based on whether they involve parameter updates. Models involving parameter updates are further trained on specific task datasets. In contrast, zero-shot learning, one-shot learning, and few-shot learning do not require gradient updates; they directly leverage the capabilities of the pre-trained model to generalize from limited or no task-specific data, completing tasks by utilizing the model’s existing knowledge and understanding.

The groundbreaking leap in LLM capabilities—for example, their ability to handle multiple tasks without specific adjustments and their user-friendly designs that reduce the need for complex coding—has led to an increasing number of studies applying their zero-shot capabilities* to psychological text analysis, including sentiment analysis, offensive language detection, mindset assessment, and emotion detection.

*The zero-shot capability of LLMs refers to the model’s ability to understand and perform new tasks without having been specifically trained or optimized for those tasks. For example, a large language model can recognize whether a text is positive, negative, or neutral by understanding its content and context, even without targeted training data.

However, as applications deepen, more voices are pointing out the limitations of LLMs. First, LLMs may produce inconsistent outputs when faced with slight variations in prompts, and when aggregating multiple repeated outputs to different prompts, LLMs sometimes fail to meet the standards of scientific reliability. Additionally, Kocoń et al. [5] found that LLMs may encounter difficulties when handling complex, subjective tasks such as sentiment recognition. Lastly, reflecting on traditional fine-tuned models, the convenience of zero-shot applications of LLMs may not be as significantly different from model fine-tuning as commonly believed.

We should recognize that small language models fine-tuned for various tasks are also continuously developing, and more models are becoming publicly available today. Moreover, an increasing number of high-quality and specialized datasets are available for researchers to fine-tune language models. Although the zero-shot applications of LLMs may provide immediate convenience, the most straightforward choice is often not the most effective one, and researchers should maintain necessary caution when attracted by convenience.

To observe ChatGPT’s capabilities in text processing more intuitively, researchers set up three levels of models: zero-shot, few-shot, and fine-tuned, to extract moral values from online texts. This is a challenging task because even trained human annotators often disagree. The expression of moral values in language is usually extremely implicit, and due to length limitations, online posts often contain little background information. Researchers provided 2,983 social media posts containing moral or non-moral language to ChatGPT, asking it to judge whether the posts used any specific types of moral language. They then compared it with a small BERT model fine-tuned on a separate subset of social media posts, using human evaluators’ judgments as the standard.

The results showed that the fine-tuned BERT model performed far better than ChatGPT in the zero-shot setting; BERT achieved an F1 score of 0.48, while ChatGPT only reached 0.22. Even methods based on LIWC surpassed ChatGPT (zero-shot) in F1 score, reaching 0.27. ChatGPT exhibited extremely extreme behavior in predicting moral sentiments, while BERT showed no significant differences from trained human annotators in almost all cases.

Although LIWC is a smaller, less complex, and less costly model, its likelihood and extremity of deviating from trained human annotators are significantly lower than those of ChatGPT. As expected, few-shot learning and fine-tuning both improved ChatGPT’s performance in the experiment. We draw two conclusions. First, the cross-contextual and flexibility advantages claimed by LLMs may not always hold. Second, although LLMs are very convenient as “plug-and-play,” they may sometimes fail completely, and appropriate fine-tuning can mitigate these issues.

▷ Jean-Michel Bihorel

In addition to inconsistencies in text annotation, inadequacies in explaining complex concepts (such as implicit hate speech), and possible lack of depth in specialized or sensitive domains, the lack of interpretability is also a much-criticized aspect of LLMs. As powerful language analysis tools, LLMs derive their extensive functions from massive parameter sets, training data, and training processes. However, this increase in flexibility and performance comes at the cost of reduced interpretability and reproducibility. The so-called stronger predictive power of LLMs is an important reason why researchers in psychological text analysis tend to use neural network–based models. But if they cannot significantly surpass top-down methods, the advantages in interpretability of the latter may prompt psychologists and other social scientists to turn to more traditional models.

Overall, in many application scenarios, smaller (fine-tuned) models can be more powerful and less biased than current large (generative) language models. This is especially true when large language models are used in zero-shot and few-shot settings. For example, when exploring the language of online support forums for anxiety patients, researchers using smaller, specialized language models may be able to discover subtle details and specific language patterns directly related to the research field (e.g., worries, tolerance of uncertainty). This targeted approach can provide deeper insights into the experiences of anxiety patients, revealing their unique challenges and potential interventions. By leveraging specialized language models or top-down methods like CCR and LIWC, researchers can strike a balance between breadth and depth, enabling a more nuanced exploration of text data.

Nevertheless, as text analysis tools, LLMs may still perform valuable functions in cases where fine-tuning data is scarce, such as emerging concepts or under-researched groups. Their zero-shot capabilities enable researchers to explore pressing research topics. In these cases, adopting few-shot prompting methods may be both effective and efficient, as they require only a small number of representative examples.

Moreover, studies have shown that LLMs can benefit from theory-driven methods. Based on this finding, developing techniques that combine the advantages of both approaches is a promising direction for future research. With the rapid advancement of large language model technology, solving performance and bias issues is only a matter of time, and it is expected that these challenges will be effectively alleviated in the near future.
 

3. Reproducibility Cannot Be Ignored

Reproducibility refers to the ability to replicate and verify results using the same data and methods. However, the black-box nature of LLMs makes related research findings difficult to reproduce. For studies that rely on data or analyses generated by LLMs, this limitation poses a significant obstacle to achieving reproducibility.

For example, after an LLM is updated, its preferences may change, potentially affecting the effectiveness of previously established “best practices” and “debiasing strategies.” Currently, ChatGPT and other closed-source models do not provide their older versions, which limits researchers’ ability to reproduce results using models from specific points in time. For instance, once the “gpt-3.5-January-2023” version is updated, its parameters and generated outputs may change, challenging the rigor of scientific research. Importantly, new versions do not guarantee the same or better performance on all tasks. For example, GPT-3.5 and GPT-4 have been reported to produce inconsistent results on various text analysis tasks—GPT-4 sometimes performs worse than GPT-3.5 [6]—which further deepens concerns about non-transparent changes in the models.

Beyond considering the black-box nature of LLMs from the perspective of open science, researchers are more concerned with the scientific spirit of “knowing what it is and why it is so.” When obtaining high-quality and informative semantic representations, we should focus more on the algorithms used to generate these outputs rather than the outputs themselves. In the past, one of the main advantages of computational models was that they allowed us to “peek inside”; certain psychological processes that are difficult to test can be inferred through models. Therefore, using proprietary LLMs that do not provide this level of access may hinder researchers in psychology and other fields from benefiting from the latest advances in computational science.

 

▷ Stuart McReath
 

4. Conclusion

The new generation of online service-oriented large language models (LLMs) developed for the general public—such as ChatGPT, Gemini, and Claude—provides many researchers with tools that are both powerful and easy to use. However, as these tools become more popular and user-friendly, researchers have a responsibility to maintain a clear understanding of both the capabilities and limitations of these models. Particularly in certain tasks, the excellent performance and high interactivity of LLMs may lead people to mistakenly believe that they are always the best choice as research subjects or automated text analysis assistants. Such misconceptions can oversimplify people’s understanding of these complex tools and result in unwise decisions. For example, avoiding necessary fine-tuning for the sake of convenience or due to a lack of understanding may prevent full utilization of their capabilities, ultimately leading to relatively poor outcomes. Additionally, it may cause researchers to overlook unique challenges related to transparency and reproducibility.

We also need to recognize that many advantages attributed to LLMs exist in other models as well. For instance, BERT or open-source LLMs can be accessed via APIs, providing researchers who cannot self-host these technologies with a convenient and low-cost option. This enables these models to be widely used without requiring extensive coding or technical expertise. Additionally, OpenAI offers embedding models like “text-embedding-ada-003,” which can be used for downstream tasks similar to BERT.

Ultimately, the responsible use of any computational tool requires us to fully understand its capabilities and carefully consider whether it is the most suitable method for the current task. This balanced approach ensures that technological advances are utilized effectively and responsibly in research.

 

]]>
https://admin.next-question.com/science-news/llm-kills-psychology/feed/ 0
How to Bridge the Gap Between Artificial Intelligence and Human Intelligence? https://admin.next-question.com/features/bridge-the-gap/ https://admin.next-question.com/features/bridge-the-gap/#respond Wed, 30 Oct 2024 07:51:46 +0000 https://admin.next-question.com/?p=2545

1.Artificial Intelligence vs. Human Intelligence

1.1 How did early artificial intelligence models draw inspiration from our understanding of the brain?

The early development of artificial intelligence was greatly influenced by our understanding of the human brain. In the mid-20th century, advancements in neuroscience and initial insights into brain function led scientists to attempt applying these biological concepts to the development of machine intelligence.

In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts introduced the “McCulloch-Pitts neuron model,” one of the earliest attempts in this area. This model described the activity of neurons using mathematical logic, which, although simplistic, has laid the groundwork for later artificial neural networks.

▷ Fig.1: Structure of a neuron and the McCulloch-Pitts neuron model.

During this period, brain research primarily focused on how neurons process information and interact within complex networks via electrical signals. These studies inspired early AI researchers to design primitive artificial neural networks.

In the 1950s, the perceptron, invented by Frank Rosenblatt, was an algorithm inspired by biological visual systems, simulating how the retina processes information by receiving light. Although it was rudimentary, it marked a significant step forward in the field of machine learning.

 

▷ Fig.2: Left: Rosenblatt’s physical perceptron; Right: Structure of the perceptron system.

In addition to the influence of neuroscience, early cognitive psychology research also contributed to the development of AI. Cognitive psychologists sought to understand how humans perceive, remember, think, and solve problems, providing a methodological foundation for simulating human intelligent behavior in AI. For instance, the logic theorist developed by Allen Newell and Herbert A. Simon could prove mathematical theorems [1-3], simulating the human problem-solving process and, to some extent, mimicking the logical reasoning involved in human thought.

Although these early models were simple, their development and design were profoundly shaped by contemporary understandings of the brain, which established a theoretical and practical foundation for the development of more complex systems. Through such explorations, scientists gradually built intelligent systems capable of mimicking or even surpassing human performance in specific tasks, driving the evolution and innovation of artificial intelligence technology.

1.2 Development of Artificial Intelligence

Since then, the field of artificial intelligence has experienced cycles of “winters” and “revivals.” In the 1970s and 1980s, improvements in computational power and innovations in algorithms, such as the introduction of the backpropagation algorithm, made it possible to train deeper neural networks. During this period, although artificial intelligence achieved commercial success in certain areas, such as expert systems, limitations that arose from technology and overly high expectations ultimately led to the first AI winter.

Entering the 21st century, especially after 2010, the field of artificial intelligence has witnessed unprecedented advancements. The exponential growth of data, the proliferation of high-performance computing resources like GPUs, and further optimization of algorithms propelled deep learning technologies as the main driving force behind AI development.

The core of deep learning remains the simulation of how brain neurons process information, and its applications have far surpassed initial expectations, encompassing numerous fields such as image recognition, natural language processing, autonomous vehicles, and medical diagnostics. These groundbreaking advancements have not only driven technological progress but also fostered the emergence of new business models and rapid industry development.

▷ Giordano Poloni

1.3 Differences Between Artificial Intelligence and Human Intelligence

1.3.1 Differences in Functional Performance

Although artificial intelligence has surpassed human capabilities in specific domains such as board games and certain image and speech recognition tasks, it generally lacks cross-domain adaptability. While some AI systems, particularly deep learning models, excel in big data environments, they typically require vast amounts of labeled data for training, and their transfer learning abilities are limited when tasks or environments change, often necessitating the design of specific algorithms. In contrast, the human brain possesses a robust learning and adaptation capacity, which is able to learn new tasks with minimal data across various conditions and perform transfer learning, applying knowledge gained in one domain to seemingly unrelated areas.

In terms of flexibility in addressing complex problems, AI performs best with well-defined and structured issues, such as board games and language translation. However, its efficiency drops when dealing with ambiguous and unstructured problems, which makes it susceptible to interference. The human brain exhibited high flexibility and efficiency in processing vague and complex environmental information; for instance, it can recognize sounds in noisy environments and make decisions despite incomplete information.

Regarding consciousness and cognition, current AI systems lack true awareness and emotions. Their “decisions” are purely algorithmic outputs based on data, devoid of subjective experience or emotional involvement. Humans, on the other hand, not only process information but also possess consciousness, emotions, and subjective experiences, which are essential components of human intelligence.

In multitasking, while some AI systems can handle multiple tasks simultaneously, this often requires complex, targeted designs. Most AI systems are typically designed for single tasks, and their efficiency and effectiveness in multitasking typically do not match those of the human brain, which can quickly switch between tasks while maintaining high efficiency.

In terms of energy consumption and efficiency, advanced AI systems, especially large machine learning models, often demand significant computational resources and energy, far exceeding that of the human brain. The brain operates on about 20 watts, showcasing exceptionally high information processing efficiency.

Overall, while artificial intelligence has demonstrated remarkable performance in specific areas, it still cannot fully replicate the human brain, particularly in flexibility, learning efficiency, and multitasking. Future AI research may continue to narrow these gaps, but the complexity and efficiency of the human brain remain benchmarks that are difficult to surpass.

▷ Spooky Pooka ltd

1.3.2 Differences in Underlying Mechanisms

In terms of structure, modern AI systems, especially neural networks, are inspired by biological neural networks, yet the “neurons” (typically computational units) and their interconnections rely on numerical simulations. The connections and processing in these artificial neural networks are usually pre-set and static, lacking the dynamic plasticity of biological neural networks. The human brain comprises approximately 86 billion neurons, each connected to thousands to tens of thousands of other neurons via synapses [6-8], supporting complex parallel processing and highly dynamic information exchange.

Regarding signal transmission, AI systems transmit signals through numerical calculations. For instance, in neural networks, the output of a neuron is a function of the weighted sum of its inputs, processed using simple mathematical functions such as Sigmoid or ReLU. Neural signal transmission relies on electrochemical processes, where information exchange between neurons occurs through the release of neurotransmitters at synapses, regulated by various biochemical processes.

In terms of learning mechanisms, AI learning typically adjusts parameters (such as weights) through algorithms, such as backpropagation. Although this method is technically effective, it requires substantial amounts of data and necessitates retraining or significant adjustment of model parameters for new datasets, highlighting a gap compared to the brain’s continuous and unsupervised learning approach. Learning in the human brain relies on synaptic plasticity, where the strength of neural connections changes based on experience and activity, supporting ongoing learning and memory formation.

1.4 Background and Definition of the Long-Term Goal of Simulating Human Intelligence—Artificial General Intelligence

The concept of Artificial General Intelligence (AGI) arose from recognizing the limitations of narrow artificial intelligence (AI). Narrow AI typically focuses on solving specific, well-defined problems, such as board games or language translation, but lacks the flexibility to adapt across tasks and domains. As technology advances and our understanding of human intelligence deepens, scientists begin to envision an intelligent system with human-like cognitive abilities, self-awareness, creativity, and logical reasoning across multiple domains.

AGI aims to create an intelligent system capable of understanding and solving problems across various fields, with the ability to learn and adapt independently. This system would not merely serve as a tool; instead, it would participate as an intelligent entity in human socio-economic and cultural activities. The proposal of AGI represents the ideal state of AI development, aspiring to achieve and surpass human intelligence in comprehensiveness and flexibility.

 

2. Pathways to Achieving Artificial General Intelligence

Diverse neuron simulations and network structures exhibit varying levels of complexity. Neurons with richer dynamic descriptions possess higher internal complexity, while networks with wider and deeper connections exhibit greater external complexity. From the perspective of complexity, there are currently two promising pathways to achieve Artificial General Intelligence: one is the external complexity large model approach, which involves increasing the width and depth of the model; the other is the internal complexity small model approach, which entails adding ion channels to the model or transforming it into a multi-compartment model.

▷ Fig.3: Internal and external complexity of neurons and networks.

 

2.1 External Complexity Large Model Approach

In the field of artificial intelligence (AI), researchers increasingly rely on the development of large AI models to tackle broader and more complex problems. These models typically feature deeper, larger, and wider network structures, known as the “external complexity large model approach.” The core of this method lies in enhancing the model’s ability to process information (especially when dealing with large data sets) and learn by scaling up the model.

2.1.1 Applications of Large Language Models

Large language models, such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), are currently hot topics in AI research. These models learn from vast text data using deep neural networks, master the deep semantics and structures of language, and demonstrate exceptional performance in various language processing tasks. For instance, GPT-3, trained on a massive text dataset, not only generates high-quality text but also performs tasks like question answering, summarization, and translation.

The primary applications of these large language models include natural language understanding, text generation, and sentiment analysis, making them widely applicable in fields such as search engines, social media analysis, and automated customer service.

2.1.2 Why Expand Model Scale?

According to research by Jason Wei, Yi Tay, William Fedus, and others in “Emergent Abilities of Large Language Models,” as the model size increases, a phenomenon of “emergence” occurs, where certain previously latent capabilities suddenly become apparent. This is due to the model’s ability to learn deeper patterns and associations by processing more complex and diverse information.

For example, ultra-large language models can exhibit problem-solving capabilities for complex reasoning tasks and creative writing without specific targeted training. This phenomenon of “emergent intelligence” suggests that by increasing model size, a broader cognitive and processing capability closer to human intelligence can be achieved.

▷ Fig.4: The emergence of large language models.

2.1.3 Challenges

Despite the unprecedented capabilities brought by large models, they face significant challenges, particularly concerning efficiency and cost.

First, these models require substantial computational resources, including high-performance GPUs and extensive storage, which directly increases research and deployment costs. Second, the energy consumption of large models is increasingly concerning, affecting their sustainability and raising environmental issues. Additionally, training these models requires vast amounts of data input, which may lead to data privacy and security issues, especially when sensitive or personal information is involved. Finally, the complexity and opacity of large models may render their decision-making processes difficult to interpret, which could pose serious problems in fields like healthcare and law, where high transparency and interpretability are crucial.

2.2 Internal Complexity Small Model Approach

When discussing large language models, they leave a strong impression with their highly “human-like” output capabilities. Webb et al. examined the analogical reasoning abilities of ChatGPT [3] and found that it has emerged with zero-shot reasoning capabilities, enabling it to address a wide range of analogical reasoning problems without explicit training. Some believe that, if large language models (LLMs) like ChatGPT can indeed produce human-like responses to common psychological measures (such as judgments about actions, recognition of values, and views on social issues), they may eventually replace human subject groups in the future.

2.2.1 Theoretical Foundations

Neurons are the fundamental structural and functional units of the nervous system, primarily composed of cell bodies, axons, dendrites, and synapses. These components work together to receive, integrate, and transmit information. The following sections will introduce the theoretical foundations of neuron simulation, covering neuron models, the conduction of electrical signals in neuronal processes (dendrites and axons), synapses and synaptic plasticity models, and models with complex dendrites and ion channels.

▷ Fig.5: Structure of a neuron.

 

(1) Neuron Models

Ion Channels

Ion channels and pumps in neurons are crucial membrane proteins that regulate the transmission of neural electrical signals. They control the movement of ions across the cell membrane, thereby influencing the electrical activity and signal transmission of neurons. These structures ensure that neurons can maintain resting potentials and generate action potentials, forming the foundation of the nervous system’s function.

Ion channels are protein channels embedded in the cell membrane that regulate the passage of specific ions (such as sodium, potassium, calcium, and chloride). Various factors, including voltage changes, chemical signals, and mechanical stress, control the opening and closing of these ion channels, impacting the electrical activity of neurons.

▷ Fig.6: Ion channels and ion pumps for neurons.

Equivalent Circuit

The equivalent circuit model simulates the electrophysiological properties of neuronal membranes using circuit components, allowing complex biological electrical phenomena to be explained and analyzed within the framework of physics and engineering. This model typically includes three basic components: membrane capacitance, membrane resistance, and a power source.

The cell membrane of a neuron exhibits capacitive properties related to its phospholipid bilayer structure. The hydrophobic core of the lipid bilayer prevents the free passage of ions, resulting in high electrical insulation of the cell membrane. When the ion concentrations differ on either side of the cell membrane, especially under the regulation of ion pumps, charge separation occurs. Due to the insulating properties of the cell membrane, this charge separation creates an electric field that allows the membrane to store charge.

Capacitance elements are used to simulate this charge storage capability, with capacitance values depending on the membrane’s area and thickness. Membrane resistance is primarily regulated by the opening and closing of ion channels, directly affecting the rate of change of membrane potential and the cell’s response to current input. The power source represents the electrochemical potential difference caused by the ion concentration gradient across the membrane, which drives the maintenance of resting potential and the changes in action potential.

▷ Fig.7: Schematic diagram of the equivalent circuit.

Hodgkin-Huxley Model

Based on the idea of equivalent circuits, Alan Hodgkin and Andrew Huxley proposed the Hodgkin-Huxley (HH) model in the 1950s based on their experimental research on the giant axon of the squid. This model includes conductances for sodium (Na), potassium (K), and leak currents, representing the opening degree of each ion channel. The opening and closing of the ion channels in the model are further described by gating variables, which are voltage- and time-dependent (m, h, n). The equations of the HH model are as follows:

Where is the membrane potential, is the input current, ​, ​, and are the maximum conductances for potassium, sodium, and leak currents, respectively. ​, ​, and ​ are the equilibrium potentials for potassium, sodium, and leak currents, respectively. , and are variables associated with the states of ion channel gating for potassium and sodium currents.
The dynamics can be described by the following differential equations:

The and functions represent the rates of channel opening and closing, which were experimentally determined using the patch-clamp technique.

Leaky Integrate-and-Fire Model (LIF)

The Leaky Integrate-and-Fire model (LIF) is a commonly used mathematical model in neuroscience that simplifies the action potential of neurons. This model focuses on describing the temporal changes in membrane potential [4-5] while neglecting the complex ionic dynamics within biological neurons.

Scientists have found that when a continuous current input is applied to a neuron [6-7], the membrane potential rises until it reaches a certain threshold, leading to the firing of an action potential, after which the membrane potential rapidly resets and the process repeats. Although the LIF model does not describe the specific dynamics of ion channels, its high computational efficiency has led to its widespread application in neural network modeling and theoretical neuroscience research. Its basic equation is as follows:

Where is the membrane potential; is the resting membrane potential; is the input current; is the membrane resistance; and is the membrane time constant, reflecting the rate at which the membrane potential responds to input current. is the membrane capacitance.

In this model, when the membrane potential reaches a specific threshold value ​, the neuron fires an action potential (spike). Subsequently, the membrane potential is reset to a lower value ​ to simulate the actual process of neuronal firing.

(2) Conduction of Electrical Signals in Neuronal Processes (Cable Theory)

In the late 19th to early 20th centuries, scientists began to recognize that electrical signals in neurons could propagate through elongated neural fibers such as axons and dendrites. However, as the distance increases, signals tend to attenuate. Researchers needed a theoretical tool to explain the propagation of electrical signals in neural fibers, particularly the voltage changes over long distances.

In 1907, physicist Wilhelm Hermann proposed a simple theoretical framework that likened nerve fibers to cables to describe the diffusion process of electrical signals. This theory was later further developed in the mid-20th century by Hodgkin, Huxley, and others, who confirmed the critical role of ion flow in signal propagation through experimental measurements of neurons and established mathematical models related to cable theory.

The core idea of cable theory is to treat nerve fibers as segments of a cable, introducing electrical parameters such as resistance and capacitance to simulate the propagation of electrical signals (typically action potentials) within nerve fibers. Nerve fibers, such as axons and dendrites, are viewed as one-dimensional cables, with electrical signals propagating along the length of the fiber; membrane electrical activity is described through resistance and capacitance, with current conduction influenced by internal resistance and membrane leakage resistance; the signal gradually attenuates as it propagates through the fiber.

▷ Fig.8: Schematic diagram of cable theory.

The cable equation is:

Where ​ represents the membrane capacitance per unit area, reflecting the role of the neuronal membrane as a capacitor; is the radius of the nerve fiber, which affects the propagation range of the electrical signal; ​ is the resistivity of the nerve fiber’s axial cytoplasm, describing the ease of current propagation along the fiber; and ​ is the ionic current density, representing the flow of ion currents through the membrane.

The temporal change, governed by the term CM and ∂ V(x,t) / ∂ t​, reflects how the membrane potential changes over time; the spatial spread, represented by the term (α/ 2 ρL)*(∂^2 V(x,t)/ ∂^2 x*2), describes the gradual spread and attenuation of the signal along the nerve fiber, which is related to the fiber’s resistance and geometry. The term​ iion indicates the ionic current through the membrane, which controls the generation and recovery of the action potential. The opening of ion channels is fundamental to signal propagation.

 

(3) Multi-Compartment Model

In earlier neuron modeling, such as the HH model and cable theory, neurons were simplified to a point-like “single compartment,” only considering temporal changes in membrane potential while neglecting the spatial distribution of various parts of the neuron. These models are suitable for describing the mechanisms of action potential generation but fail to fully explain signal propagation in the complex morphological structures of neurons (such as dendrites and axons).

As neuroscience deepened its understanding of the complexity of neuronal structures, scientists recognized that voltage changes in different parts of the neuron can vary significantly, especially in neurons with long dendrites. Signal propagation in dendrites and axons is influenced not only by the spatial diffusion of electrical signals but also by structural complexity, resulting in different responses. Thus, a more refined model was needed to describe the spatial propagation of electrical signals in neurons, leading to the development of the multi-compartment model.

The core idea of the multi-compartment model is to divide the neuron’s dendrites, axons, and cell body into multiple interconnected compartments, with each compartment described using equations similar to those of cable theory to model the changes in transmembrane potential  over time and space. By connecting multiple compartments, the model simulates the complex propagation pathways of electrical signals within neurons and reflects the voltage differences between different compartments. This approach allows for precise description of electrical signal propagation in the complex morphology of neurons, particularly the attenuation and amplification of electrical signals on dendrites.

Specifically, neurons are divided into multiple compartments, each representing a portion of the neuron (such as dendrites, axons, or a segment of the cell body). Each compartment is represented by a circuit model, with resistance and capacitance used to describe the electrical properties of the membrane. The transmembrane potential is determined by factors such as current injection, diffusion, and leakage. Adjacent compartments are connected by resistors, and electrical signals propagate between compartments through these connections. The transmembrane potential Vi follows a differential equation similar to cable theory in the i-th compartment:

Where Ci is the membrane capacitance of compartment, Imem,i(t) is the membrane current, and Raxial,i is the axial resistance between compartments. These coupled equations describe how the signal propagates and attenuates within different compartments of the neuron.

In the multi-compartment model, certain compartments (such as the cell body or initial segment) can generate action potentials, while others (like dendrites or axons) primarily facilitate the propagation and attenuation of electrical signals. Signals are transmitted through connections between different compartments, with input signals in the dendritic region ultimately integrated at the cell body to trigger action potentials, which then propagate along the axon.

Compared to single-compartment models, the multi-compartment model can better reflect the complexity of neuronal morphological structures, particularly in the propagation of electrical signals within structures like dendrites and axons. Due to the coupling differential equations involving multiple compartments, the multi-compartment model often requires numerical methods (such as the Euler method or Runge-Kutta method) for solution.

2.2.2 Why Conduct Complex Dynamic Simulations of Biological Neurons?

Research by Beniaguev et al. has shown that the complex dendritic structures and ion channels of different types of neurons in the brain enable a single neuron to possess extraordinary computational capabilities, comparable to those of a 5-8 layer deep learning network [8].

▷ Fig.9: A model of Layer 5 cortical pyramidal neurons with AMPA and NMDA synapses, accurately simulated using a Time Convolutional Network (TCN) with seven hidden layers, each containing 128 feature maps, and a historical duration of 153 milliseconds.

He et al. focused on the relationships between different internal dynamics and complexities of neuron models [9]. They proposed a method for converting external complexity into internal complexity, noting that models with richer internal dynamics exhibit certain computational advantages. Specifically, they theoretically demonstrated the equivalence of dynamic characteristics between the LIF model and the HH model, showing that an HH neuron can be dynamically equivalent to four time-varying parameter LIF neurons (tv-LIF) with specific connection structures.

▷ Fig.10: A method for converting from the tv-LIF model to the HH model.

Building on this, they experimentally validated the effectiveness and reliability of HH networks in handling complex tasks. They discovered that the computational efficiency of HH networks is significantly higher compared to simplified tv-LIF networks (s-LIF2HH). This finding demonstrates that converting external complexity into internal complexity can enhance the computational efficiency of deep learning models. It suggests that the internal complexity small model approach, inspired by the complex dynamics of biological neurons, holds promise for achieving more powerful and efficient AI systems.

▷ Fig.11: Computational resource analysis of the LIF model, HH model, and s-LIF2HH.

Moreover, due to structural and computational mechanism limitations, existing artificial neural networks differ greatly from real brains, making them unsuitable for directly understanding the mechanisms of real brain learning and perception tasks. Compared to artificial neural networks, neuron models with rich internal dynamics are closer to biological reality. They play a crucial role in understanding the learning processes of real brains and the mechanisms of human intelligence.

2.3 Challenges

Despite the impressive performance of the internal complexity small model approach, it faces a series of challenges. The electrophysiological activity of neurons is often described by complex nonlinear differential equations, making model analysis and solution quite challenging. Due to the nonlinear and discontinuous characteristics of neuron models, using traditional gradient descent methods for learning becomes complex and inefficient. Furthermore, increasing internal complexity, as seen in models like HH, reduces hardware parallelism and slows down information processing speed. This necessitates corresponding innovations and improvements in hardware.

To tackle these challenges, researchers have developed various improved learning algorithms. For example, approximate gradients are used to address discontinuous characteristics, while second-order optimization algorithms capture curvature information of the loss function more accurately to accelerate convergence. The introduction of distributed learning and parallel computing allows the training process of complex neuron networks to be conducted more efficiently on large-scale computational resources.

Additionally, bio-inspired learning mechanisms have garnered interest from some scholars. The learning processes of biological neurons differ significantly from current deep learning methods. For instance, biological neurons rely on synaptic plasticity for learning, which includes the strengthening and weakening of synaptic connections, known as long-term potentiation (LTP) and long-term depression (LTD). This mechanism is not only more efficient but also reduces the model’s dependence on continuous signal processing, thereby lowering the computational burden.

▷ MJ

3. Bridging the Gap Between Artificial Intelligence and Human Brain Intelligence

He et al. theoretically validated and simulated that smaller, internally complex networks can replicate the functions of larger, simpler networks. This approach not only maintains performance but also enhances computational efficiency, reducing memory usage by four times and doubling processing speed. This suggests that increasing internal complexity may be an effective way to improve AI performance and efficiency.

Zhu and Eshraghian commented on He et al.’s article, “Network Model with Internal Complexity Bridges Artificial Intelligence and Neuroscience” [5]. They noted, “The debate over internal and external complexity in AI remains unresolved, with both approaches likely to play a role in future advancements. By re-examining and deepening the connections between neuroscience and AI, we may discover new methods for constructing more efficient, powerful, and brain-like artificial intelligence systems.”

As we stand at the crossroads of AI development, the field faces a critical question: Can we achieve the next leap in AI capabilities by more precisely simulating the dynamics of biological neurons, or will we continue to advance with larger models and more powerful hardware? Zhu and Eshraghian suggest that the answer may lie in integrating both approaches, which will continuously optimize as our understanding of neuroscience deepens.

Although the introduction of biological neuron dynamics has enhanced AI capabilities to some extent, we are still far from achieving the technological level required to simulate human consciousness. First, the completeness of the theory remains insufficient. Our understanding of the nature of consciousness is lacking, and we have yet to develop a comprehensive theory capable of explaining and predicting conscious phenomena. Second, simulating consciousness may require high-performance computational frameworks that current hardware and algorithm efficiencies cannot yet support. Moreover, efficient training algorithms for brain models remain a challenge. The nonlinear behavior of complex neurons complicates model training, necessitating new optimization methods. Many complex brain functions, such as long-term memory retention, emotional processing, and creativity, still require in-depth exploration of their specific neural and molecular mechanisms. How to further simulate these behaviors and their molecular mechanisms in artificial neural networks remains an open question. Future research must make breakthroughs on these issues to truly advance the simulation of human consciousness and intelligence.

Interdisciplinary collaboration is crucial for simulating human consciousness and intelligence. Cooperative research across mathematics, neuroscience, cognitive science, philosophy, and computer science will deepen our understanding and simulation of human consciousness and intelligence. Only through collaboration among different disciplines can we form a more comprehensive theoretical framework and advance this highly challenging task.

]]>
https://admin.next-question.com/features/bridge-the-gap/feed/ 0
A Quiet Revolution: How Focused Ultrasound is Redefining Non-Invasive Brain Treatment https://admin.next-question.com/features/fus-brain/ https://admin.next-question.com/features/fus-brain/#respond Wed, 30 Oct 2024 07:23:58 +0000 https://admin.next-question.com/?p=2535 From the moment we are born, our brains continuously receive auditory information from the external world. Language, carried by sound waves, shapes our cognition, and music evokes aesthetic experiences in our minds. When the frequency surpasses the range detectable by the human ear, ultrasound waves can also affect the brain. Among the continuously advancing technologies in recent years is focused ultrasound. This technology is akin to using a convex lens to concentrate sunlight and ignite a fire; by focusing ultrasound waves on a specific point, it generates powerful energy, achieving therapeutic effects through a relatively non-invasive method. Its application in biomedicine, particularly in brain science, is initiating a revolutionary transformation.

 

1. Principles

(1) Physical Basis

The sounds we hear in our daily lives are all perceptible sound waves to humans, with frequencies ranging from 20 Hz to 20,000 Hz. Focused Ultrasound (FUS), however, utilizes ultrasound waves of much higher frequencies, far beyond the range of human hearing.

During the propagation of ultrasound waves, interference occurs, meaning the waves can mutually enhance or cancel each other out. By strategically arranging multiple ultrasound transducers, we can harness this interference to concentrate the energy of ultrasound waves at a specific focal point. This technique of focusing using ultrasound interference is known as FUS.

In an FUS system, each transducer can independently control the phase of the sound wave. By precisely calculating the phase of each transducer, a desired focal point can be produced. However, in practical applications, the shape and size of the focal point are also influenced by other factors, such as the propagation characteristics of the sound waves in different materials and the acoustic properties of the materials as they vary with temperature and frequency.

To overcome these issues, some FUS systems adopt “dual-mode ultrasound” technology. This involves the simultaneous use of a separate probe for ultrasound imaging while therapeutic ultrasound is being applied, allowing real-time monitoring of the focal point’s position and size and timely adjustment of focusing parameters to optimize therapeutic effects. This technology is currently used in the treatment of localized organ diseases such as the prostate.

The structural design of the ultrasound probe is also crucial for FUS. Different geometric configurations can produce different ultrasound beam shapes, making them suitable for various applications. In neurosurgery, tightly spaced, unidirectional transducer arrays are typically used, imaging along a straight path through small openings in the skull. Besides the geometric configuration of the transducer array, the parameters of the ultrasound waves themselves, such as frequency and amplitude, can also be adjusted. To avoid excessive heat generation, ultrasound is usually delivered in pulses, with the pulse repetition frequency and pulse duration also being adjustable.

 

(2) Biological Effects

When ultrasound waves penetrate biological tissues, they induce a series of complex physical processes, which can be broadly categorized into thermal and non-thermal effects.

The temperature rise induced by ultrasound in tissues primarily depends on the intensity of the sound waves and the tissue’s absorption properties. Generally, the higher the ultrasound frequency, the shallower the penetration depth but the higher the resolution. This means a balance must be struck between penetration depth, resolution, and frequency. When ultrasound generates heat within tissues, the tissue’s impedance and thermal conductivity properties affect the diffusion of heat. Physiological cooling mechanisms, such as blood perfusion and thermal diffusion, also play significant roles in the tissue heating process.

High-Intensity Focused Ultrasound (HIFU) can generate temperatures within tissues high enough to alter protein structures and coagulate tissues. It has been clinically applied to ablate kidney stones, tumors, and treat certain brain lesions causing movement disorders. In contrast, Low-Intensity Focused Ultrasound (LIFU) induces temperature changes within the normal physiological range, avoiding irreversible damage.

The non-thermal effects of FUS include mechanical forces, radiation forces, and some organ-specific effects, such as reversible opening of the blood-brain barrier and altering neuronal membrane potentials.

The mechanical effects of FUS are evident in its ability to directly act on certain mechanosensitive ion channels and proteins, including sodium and potassium channels, thereby altering the state of neurons. High-intensity ultrasound can also physically tear tissues, but its safety is challenging to assess.

Additionally, when ultrasound intensity is sufficiently high, cavitation occurs. Cavitation refers to the growth and collapse of microbubbles during the compression and expansion phases of sound waves. The threshold for cavitation depends on factors such as sound wave frequency, temperature, and pressure. Microbubble nuclei are needed to initiate cavitation, serving as the starting points for the growth, oscillation (stable cavitation), or violent collapse (inertial cavitation) of microbubbles. In HIFU, gases released due to thermal effects can become the primary source of these microbubble nuclei. Cavitation can influence cell membrane potentials and induce micro-streaming, forming turbulence that affects surrounding cells.

In summary, FUS can produce thermal and non-thermal effects in biological tissues, with distinct impacts. HIFU can raise tissue temperatures to 43-60 degrees Celsius, causing time-dependent damage and, at higher intensities, immediate tissue damage. This damage is mainly achieved through thermal and cavitation effects. With advancements in non-invasive temperature monitoring technology, MRI-assisted HIFU therapy has gained widespread application, allowing precise control of lesion size and ensuring safety.

Conversely, LIFU can induce reversible neurophysiological responses, such as increasing or decreasing neuronal firing rates and conduction velocity, inhibiting visual and somatosensory evoked potentials, EEG, and epileptic seizures. The exact mechanisms of LIFU remain uncertain and may involve thermal effects, mechanical effects, and changes in ion channel activity, warranting further research.

 

 Figure 1. Biological effects of FUS. Source: Meng, Ying, Kullervo Hynynen, and Nir Lipsman. “Applications of focused ultrasound in the brain: from thermoablation to drug delivery.” Nature Reviews Neurology 17.1 (2021): 7-22.

 

2. Technological Development

In 1935, Gruetzmacher designed a curved quartz plate that could focus ultrasound waves to a single point, leading to the birth of the first FUS transducer. Eight years later, Lynn and colleagues at Columbia University in the United States first reported the application of FUS in the brain during animal experiments. They discovered that by instantly raising HIFU to its maximum intensity, the effects at the focal point could be maximized while minimizing damage to nearby areas.

Despite the technological limitations at the time, these findings established HIFU as a feasible method for creating a precise focal point while reducing damage along the path. They also found that surface and along-the-path damage were inversely proportional to the distance from the focal point, suggesting that the technology might be more suitable for targeting deep brain areas. Additionally, using lower frequencies could reduce absorption and heating of surface tissues, favoring absorption at the focal point. They also found that focused ultrasound could create reversible nerve damage, with ganglion cells being more susceptible than glial cells and blood vessels. These findings laid the groundwork for the subsequent development of FUS by demonstrating its ability to produce safe, reversible effects in biological tissues.

Subsequently, William Fry and Francis Fry from the University of Illinois further advanced FUS technology. Early studies showed that focused ultrasound could damage surface tissues like the scalp and skull, affecting the focus. To address this issue, the Fry brothers decided to apply focused ultrasound directly to the dura mater through craniotomy.

In 1954, the Fry research team published a seminal paper describing their method of targeting deep brain structures using a device with four focused ultrasound beams (see Figure 2). This device could be used in conjunction with stereotactic equipment, demonstrating for the first time the effectiveness of combining focused ultrasound with stereotactic techniques in animal models. They successfully ablated the thalamus and internal capsule in 31 cats, with histological examination showing cellular changes in the target area within two hours of exposure. Unlike Lynn’s findings, this experiment primarily damaged nerve fibers while leaving the neuronal cell bodies in the target area largely unaffected. Additionally, there was no significant damage to blood vessels and surrounding tissues.

 

 Figure 2. The four-beam focused ultrasound device used by the Frys. Source: Harary, Maya, et al. “Focused ultrasound in neurosurgery: a historical perspective.” Neurosurgical Focus 44.2 (2018): E2.

Meanwhile, the Fry team used precise focused ultrasound stimulation of the lateral geniculate body to temporarily suppress the brain’s response to retinal flash stimuli. Specifically, electrodes were placed on the visual cortex to measure the brain’s electrophysiological response to light stimuli. During focused ultrasound exposure, the amplitude of these evoked potentials decreased to less than one-third of the baseline value. Surprisingly, once the ultrasound stimulation ceased, these electrophysiological indicators returned to their original levels within 30 minutes. More importantly, this dose of focused ultrasound did not cause any observable histological damage to the underlying neural tissue. This discovery pioneered a new concept: FUS neuromodulation.

After achieving success in animal experiments, the Fry laboratory collaborated with the Neurosurgery Department at the University of Iowa to apply FUS in human neurosurgery. They targeted deep brain regions in patients with Parkinson’s disease, attempting to treat tremors and rigidity with FUS. In 1960, Meyers and Fry published a treatment study involving 48 patients, demonstrating the therapeutic effects of FUS on Parkinsonian tremors and rigidity.

By the latter half of the 20th century, the therapeutic potential of FUS had gradually gained recognition. However, to avoid damage and distortion to surface tissues when passing through the skull, craniotomy was necessary, making the procedure still invasive. For FUS to further advance, two critical issues needed to be addressed: transcranial focusing and real-time monitoring.

To achieve transcranial focusing, FUS had to overcome two major obstacles: local overheating of the skull and beam propagation distortion due to tissue inhomogeneity. Bone absorbs ultrasound waves 30-60 times more than soft tissue. Early experiments found that the interaction between ultrasound waves and the skull led to rapid local heating of the skull, limiting the safe level of energy that could be applied. This issue was eventually resolved by using low-frequency hemispherical transducers and actively cooling the scalp. Low frequencies reduced surface absorption, while hemispherical transducers distributed local heating over a larger surface area, and scalp cooling prevented excessive heating.

Additionally, beam propagation and focusing were significantly distorted due to the acoustic impedance mismatch between bone and brain, as well as individual variations in skull shape, thickness, and the ratio of cortical bone to marrow. Until the early 1990s, this problem remained unresolved. The emergence of phased array technology, which corrects for delays and changes encountered during wave propagation by applying different phase shifts to each element, finally enabled precise targeting previously achievable only through craniotomy. Coupled with acoustic feedback techniques to accurately measure phase shifts caused by the human skull, focused ultrasound technology overcame this critical obstacle. These groundbreaking advancements laid the foundation for the modern development of focused ultrasound, making completely non-invasive treatment of deep brain structures possible.

Early applications of HIFU thermal ablation were primarily by surgeons for treating prostate, urinary system, breast, and gynecological tumors. In these applications, physicians could use diagnostic ultrasound to guide and monitor the treatment process in real-time. However, in neurosurgical applications, the skull impeded ultrasound imaging of internal tissue changes. In the late 1980s and early 1990s, Dr. Jolesz’s team pioneered the use of intraoperative magnetic resonance imaging (MRI) to address this issue. Subsequently, they turned their attention to using MR thermometry to monitor temperature changes within the brain in real-time during focused ultrasound treatment.

By the late 1990s, the Jolesz team discovered that low-power FUS could raise the temperature of the target area to 40-42 degrees Celsius without causing damage. This sub-threshold ultrasound exposure generated a thermal signal that could be located and targeted using MR thermometry, preparing for subsequent high-power ablative exposure. In the following years, Jolesz and colleagues focused on characterizing the thermodynamics, ultimately achieving the prediction of lesion size after continuous exposure and real-time monitoring of the thermal damage process.

 

3. Clinical Applications

(1) HIFU-based Thermal Ablation Therapy

HIFU can produce therapeutic effects by raising the temperature of the target tissue. When the temperature increases to 40-45 degrees Celsius, it can enhance the sensitivity of tumors to radiotherapy or aid in the release of drugs from thermosensitive liposomes. When the temperature exceeds 56 degrees Celsius, it causes tissue denaturation and necrosis.

For common tremor disorders, such as essential tremor, focused ultrasound can target and ablate key areas like the ventral intermediate nucleus (VIM) of the thalamus or the cerebellar-thalamic tract (CTT), effectively alleviating patients’ tremor symptoms. Numerous clinical studies have confirmed that unilateral FUS surgery targeting the VIM or CTT significantly improves patients’ tremor and quality of life, with most adverse effects, such as sensory disturbances and gait abnormalities, being temporary.

In Parkinson’s disease, focused ultrasound also offers multiple target options. For patients primarily exhibiting tremors, ablation of the VIM can be chosen; for motor disorders, the subthalamic nucleus (STN) or globus pallidus internus (GPi) can be targeted; for motor complications, the pallidothalamic tract (PTT) can be targeted. These FUS surgeries effectively improve motor symptoms in Parkinson’s patients, although adverse effects such as speech disorders may occur.

Additionally, focused ultrasound has applications in treating psychiatric disorders like obsessive-compulsive disorder (OCD) and depression. By ablating the anterior limb of the internal capsule (ALIC), it can effectively alleviate symptoms such as obsessive thoughts, depression, and anxiety without causing cognitive decline.

 

 Figure 3. Applications of FUS in the human brain. Source: Meng, Ying, Kullervo Hynynen, and Nir Lipsman. “Applications of focused ultrasound in the brain: from thermoablation to drug delivery.” Nature Reviews Neurology 17.1 (2021): 7-22.

 

(2) Opening the Blood-Brain Barrier

The blood-brain barrier (BBB) is a barrier formed by the walls of brain capillaries, glial cells, and the choroid plexus. Its main function is to regulate the entry and exit of substances into the brain, maintaining the brain’s stable environment. Although the BBB blocks harmful substances from entering the brain, it also hinders the entry of drugs, especially large-molecule drugs, for disease treatment.

Research has found that low-intensity focused ultrasound (LIFU) can safely and reversibly open the BBB. After injecting microbubbles, ultrasound causes these microbubbles to oscillate, temporarily disrupting the tight junctions of the BBB and allowing better drug penetration into the brain. This method has been shown in animal experiments to effectively enhance the treatment of neurological diseases such as brain tumors, Parkinson’s disease, and Alzheimer’s disease.

Furthermore, this technique of opening the BBB can non-invasively release biomarkers such as phosphorylated tau protein into the blood, aiding in the early diagnosis and monitoring of neurodegenerative diseases and brain tumors. It can also modulate the neuroimmune system to achieve therapeutic effects, such as reducing amyloid plaques and hyperphosphorylated tau protein in Alzheimer’s disease models, promoting adult neurogenesis, and altering the tumor microenvironment.

 

(3) LIFU-based Neuromodulation

In addition to opening the BBB, LIFU can precisely modulate neural activity in specific brain regions by altering the permeability of neuronal cell membranes and activating ion channels. Clinical studies have confirmed that focused ultrasound can modulate cortical functions in the human brain, inducing plastic changes. It can alter functional connectivity in the brain and affect neurochemical substances in the deep cortex. Some studies have shown that using a navigation system to precisely target specific brain regions can safely and effectively reduce the frequency of seizures in epilepsy patients, improve symptoms of neurodegenerative diseases, alleviate neuropathic pain, and reduce depression.

Compared to existing neuromodulation techniques, FUS has several potential advantages: unlike transcranial direct current stimulation (tDCS) and transcranial magnetic stimulation (TMS), FUS can target deep brain regions with millimeter-level spatial resolution. Compared to deep brain stimulation (DBS), FUS is less invasive, avoiding surgical risks and allowing for repeated treatments. By adjusting the position or direction of the transducer, multiple brain regions such as the hippocampus, prefrontal cortex, motor cortex, caudate nucleus, and substantia nigra can be stimulated.

 

 Figure 4. Research status of FUS therapy for brain diseases. Source: Focused Ultrasound Foundation. “State of the Field Report 2023 – Focused Ultrasound Foundation.” Focused Ultrasound Foundation, 20 Sept. 2023, www.fusfoundation.org/the-foundation/foundation-reports/state-of-the-field-report-2023.

 

4. Conclusion

The role of FUS in neuroscience and clinical therapy is increasingly prominent. By precisely controlling the focused energy of sound waves, FUS not only enables precise treatment of brain lesions but also shows great potential in neuromodulation and drug delivery. Its characteristics include high localization accuracy and relative non-invasiveness.

As of 2022, the FUS field has received $3.14 billion in R&D investments from government and industry, with 337 therapies approved by 39 regulatory agencies targeting 32 indications and treating a total of 565,210 cases. Currently, dozens of therapies are still under active development.

However, the development of FUS still faces a series of technical and clinical challenges. High-intensity FUS thermal ablation therapy is currently inefficient for large lesions and surrounding brain areas, and its application is limited in patients with lower skull density. Additionally, for lesions near the skull base, the surrounding sensitive neurovascular structures may be at risk. It is anticipated that in the coming years, optimized ultrasound focusing and correction technologies, as well as more personalized ultrasound transducer arrays, will emerge to minimize heating and expand the treatment range.

 

 Figure 5. Changes in the number of FUS-related publications over time. Source: Meng, Ying, Kullervo Hynynen, and Nir Lipsman. “Applications of focused ultrasound in the brain: from thermoablation to drug delivery.” Nature Reviews Neurology 17.1 (2021): 7-22.

Clinically, current research aims to improve the tolerability of FUS treatment, for example, by shortening surgery times and using neuroimaging-assisted devices. Additionally, exploring the application of FUS in new clinical indications, such as the treatment of brain lesions inducing epilepsy, is a future direction of development.

The future of FUS applications depends on interdisciplinary collaboration, involving joint efforts from medicine, physics, and neuroscience. Through in-depth research into its mechanisms and clinical applications, FUS holds immense potential to help us tackle some of the most challenging brain diseases faced by humanity.

]]>
https://admin.next-question.com/features/fus-brain/feed/ 0