NextQuestion https://admin.next-question.com 以科学追问为纽带,不断探索科学的边界。 A scientific media dedicated to exploring the exciting topics that are spurring the next big questions on the very frontiers of science. Mon, 18 Nov 2024 23:31:31 +0000 en-US hourly 1 https://wordpress.org/?v=6.7 https://admin.next-question.com/wp-content/uploads/2023/11/cropped-NQ_logo_symbol_color-32x32.png NextQuestion https://admin.next-question.com 32 32 Why Haven’t Large Language Models “Killed” Psychology? https://admin.next-question.com/science-news/llm-kills-psychology/ https://admin.next-question.com/science-news/llm-kills-psychology/#respond Sun, 17 Nov 2024 06:00:54 +0000 https://admin.next-question.com/?p=2571

Since the end of 2022, ChatGPT has swept across the globe like a tidal wave, and people are eagerly anticipating its potential applications. Business professionals, scholars, and even ordinary individuals are pondering the same question: How will AI shape the future of our work?

As time goes by, many concepts are gradually becoming reality. Humanity seems to have grown accustomed to AI assisting us or even replacing us in many work scenarios. Early fears about GPT have gradually dissipated; instead, people have become overly reliant on GPT, even overlooking possible limitations and risks. We refer to this excessive dependence on GPT while ignoring its risks as “GPTology.”

The development of psychology has always closely followed technological innovation. Sociologists and behavioral scientists have consistently leveraged technology to collect rich and diverse data. Technologies ranging from neuroimaging and online survey platforms to eye-tracking devices have all contributed to critical breakthroughs in psychology. The digital revolution and the rise of big data have fostered new disciplines like computational social science. Just as in other fields (medicine [1], politics [2]), large language models (LLMs) that can understand, generate, and translate human language with astonishing subtlety and complexity have also had a profound impact on psychology.

In psychology, there are two main applications for large language models: On one hand, studying the mechanisms of LLMs themselves may provide new insights into human cognition. On the other hand, their capabilities in text analysis and generation make them powerful tools for analyzing textual data. For example, they can transform textual data such as individuals’ written or spoken expressions into analyzable data forms, thereby assisting mental health professionals in assessing and understanding an individual’s psychological state. Recently, numerous studies have emerged using large language models to advance psychological research. Applications of ChatGPT in social and behavioral sciences, such as hate speech classification and sentiment analysis, have shown promising initial results and have broad development prospects.

However, should we allow the current momentum of “GPTology” to run rampant in the research field? In fact, the integration process of all technological innovations is always full of turbulence. Allowing unchecked application of a certain technology and becoming overly reliant on it may lead to unexpected consequences. Looking back at the history of psychology, when functional magnetic resonance imaging (fMRI) technology first emerged, some researchers abused it, leading to absurd yet statistically significant neural association phenomena—for instance, researchers performed an fMRI scan on a dead Atlantic salmon and found that the fish displayed significant brain activity during the experiment. Other studies have indicated that due to statistical misuse, the likelihood of finding false correlations in fMRI research is extremely high. These studies have entered psychology textbooks, warning all psychology students and researchers to remain vigilant when facing new technologies.

 

▷ Abdurahman, Suhaib, et al. “Perils and opportunities in using large language models in psychological research.” PNAS Nexus 3.7 (2024): pgad245.

We can say that we have entered a “cooling-off period” in our relationship with large language models. Besides considering what large language models can do, we need to reflect more on whether and why we should use them. A recent review paper in PNAS Nexus explores the application of large language models in psychological research and the new opportunities they bring to the study of human behavior.

The article acknowledges the potential utility of LLMs in enhancing psychology but also emphasizes caution against their uncritical application. Currently, these models may cause statistically significant but meaningless or ambiguous correlations in psychological research, which researchers must avoid. The authors remind us that, in the face of similar challenges encountered in recent decades (such as the replication crisis), researchers should be cautious in applying LLMs. The paper also proposes directions on how to use these models more critically and prudently in the future to advance psychological research.
 

1. Can Large Language Models Replace Human Subjects?

When it comes to large language models (LLMs), people’s most intuitive impression is of their highly “human-like” output capabilities. Webb et al. examined ChatGPT’s analogical reasoning abilities [3] and found that it has already exhibited zero-shot reasoning capabilities, able to solve a wide range of analogical reasoning problems without explicit training. Some believe that if LLMs like ChatGPT can indeed produce human-like responses to common psychological measurements—such as judgments of actions, value endorsements, and views on social issues—they may potentially replace human subject groups in the future.

Addressing this question, Dillon and colleagues conducted a dedicated study [4]. They first compared the moral judgments of humans and the language model GPT-3.5, affirming that language models can replicate some human judgments. However, they also highlighted challenges in interpreting language model outputs. Fundamentally, the “thinking” of LLMs is built upon human natural expressions, but the actual population they represent is limited, and there is a risk of oversimplifying the complex thoughts and behaviors of humans. This serves as a warning because the tendency to anthropomorphize AI systems may mislead us into expecting these systems—operating on fundamentally different principles—to exhibit human-like performance.

Current research indicates that using LLMs to simulate human subjects presents at least three major problems.

First, cross-cultural differences in cognitive processes are an extremely important aspect of psychological research, but much evidence shows that current popular LLMs cannot simulate such differences. Models like GPT are mainly trained on text data from WEIRD (Western, Educated, Industrialized, Rich, Democratic) populations. This English-centric data processing continues the English centralism in psychology, running counter to expectations of linguistic diversity. As a result, language models find it difficult to accurately reflect the diversity of the general population. For example, ChatGPT exhibits gender biases favoring male perspectives and narratives, cultural biases favoring American viewpoints or majority populations, and political biases favoring liberalism, environmentalism, and left-libertarian views. These biases also extend to personality traits, morality, and stereotypes.

Overall, because the model outputs strongly reflect the psychology of WEIRD populations, high correlations between AI and human responses cannot be reproduced when human samples are less WEIRD. In psychological research, over-reliance on WEIRD subjects (such as North American college students) once sparked discussions. Replacing human participants with LLM outputs would be a regression, making psychological research narrower and less universal.

▷ Comparing ChatGPT’s responses to the “Big Five” personality traits with human responses grouped by political views. Note: The figure shows the distribution of responses from humans and ChatGPT on the Big Five personality dimensions and different demographic data. ChatGPT gives significantly higher responses in agreeableness and conscientiousness and significantly lower responses in openness and neuroticism. Importantly, compared to all demographic groups, ChatGPT shows significantly smaller variance across all personality dimensions.

Second, LLMs seem to have a preference for “correct answers.” They exhibit low variability when answering psychological survey questions—even when the topics involved (such as moral judgments) do not have actual correct answers—while human responses to these questions are often diverse. When we ask LLMs to answer the same question multiple times and measure the variance in their answers, we find that language models cannot produce the significant ideological differences that humans do. This is inseparable from the principles behind generative language models; they generate output sequences by calculating the probability distribution of the next possible word in an autoregressive manner. Conceptually, repeatedly questioning an LLM is similar to repeatedly asking the same participant, rather than querying different participants.

However, psychologists are usually interested in studying differences between different participants. This warns us that when attempting to use LLMs to simulate human subjects, we cannot simply use language models to simulate group averages or an individual’s responses across different tasks. Appropriate methods should be developed to truly reproduce the complexity of human samples. Additionally, the data used to train LLMs may already contain many items and tasks used in psychological experiments, causing the model to rely on memory rather than reasoning when tested, further exacerbating the above issues. To obtain an unbiased evaluation of LLMs’ human-like behavior, researchers need to ensure that their tasks are not part of the model’s training data or adjust the model to avoid affecting experimental results, such as through methods like “unlearning.”

Finally, it is questionable whether GPT has truly formed a moral system similar to that of humans. By querying the LLM and constructing its internal nomological network—observing the correlations between different moral domains—it was found that these metrics differ significantly from results obtained from humans.

▷ ChatGPT and Human Moral Judgments. Note: a) Distributions of human moral judgments (light blue) and GPT (light red) across six moral domains. Dashed lines represent means. b) Interrelationships between human moral values (N=3,902) and ChatGPT responses (N=1,000). c) Partial correlation networks among moral values based on different human samples from 19 countries (30) and 1,000 GPT responses. Blue edges indicate positive partial correlations; red edges indicate negative partial correlations.

In summary, LLMs ignore population diversity, cannot exhibit significant variance, and cannot replicate nomological networks—these shortcomings indicate that LLMs should not replace studies on Homo sapiens. However, this does not mean that psychological research should completely abandon the use of LLMs. On the one hand, applying psychological measurements traditionally used for humans to AI is indeed interesting, but interpretations of the results should be more cautious. On the other hand, when using LLMs as proxy models to simulate human behavior, their intermediate layer parameters can provide potential angles for exploring human cognitive behavior. However, this process should be conducted under strictly defined environments, agents, interactions, and outcomes.

Due to the “black box” nature of LLMs and the aforementioned situation where their outputs often differ from real human behavior, this expectation is still difficult to realize. But we can hope that in the future, more robust programs can be developed, making it more feasible for LLMs to simulate human behavior in psychological research.
 

2. Are Large Language Models a Panacea for Text Analysis?

Apart from their human-like qualities, the most significant feature of large language models (LLMs) is their powerful language processing capability. Applying natural language processing (NLP) methods to psychological research is not new. To understand why the application of LLMs has sparked considerable controversy today, we need to examine how their use differs from traditional NLP methods.

NLP methods utilizing pre-trained language models can be divided into two categories based on whether they involve parameter updates. Models involving parameter updates are further trained on specific task datasets. In contrast, zero-shot learning, one-shot learning, and few-shot learning do not require gradient updates; they directly leverage the capabilities of the pre-trained model to generalize from limited or no task-specific data, completing tasks by utilizing the model’s existing knowledge and understanding.

The groundbreaking leap in LLM capabilities—for example, their ability to handle multiple tasks without specific adjustments and their user-friendly designs that reduce the need for complex coding—has led to an increasing number of studies applying their zero-shot capabilities* to psychological text analysis, including sentiment analysis, offensive language detection, mindset assessment, and emotion detection.

*The zero-shot capability of LLMs refers to the model’s ability to understand and perform new tasks without having been specifically trained or optimized for those tasks. For example, a large language model can recognize whether a text is positive, negative, or neutral by understanding its content and context, even without targeted training data.

However, as applications deepen, more voices are pointing out the limitations of LLMs. First, LLMs may produce inconsistent outputs when faced with slight variations in prompts, and when aggregating multiple repeated outputs to different prompts, LLMs sometimes fail to meet the standards of scientific reliability. Additionally, Kocoń et al. [5] found that LLMs may encounter difficulties when handling complex, subjective tasks such as sentiment recognition. Lastly, reflecting on traditional fine-tuned models, the convenience of zero-shot applications of LLMs may not be as significantly different from model fine-tuning as commonly believed.

We should recognize that small language models fine-tuned for various tasks are also continuously developing, and more models are becoming publicly available today. Moreover, an increasing number of high-quality and specialized datasets are available for researchers to fine-tune language models. Although the zero-shot applications of LLMs may provide immediate convenience, the most straightforward choice is often not the most effective one, and researchers should maintain necessary caution when attracted by convenience.

To observe ChatGPT’s capabilities in text processing more intuitively, researchers set up three levels of models: zero-shot, few-shot, and fine-tuned, to extract moral values from online texts. This is a challenging task because even trained human annotators often disagree. The expression of moral values in language is usually extremely implicit, and due to length limitations, online posts often contain little background information. Researchers provided 2,983 social media posts containing moral or non-moral language to ChatGPT, asking it to judge whether the posts used any specific types of moral language. They then compared it with a small BERT model fine-tuned on a separate subset of social media posts, using human evaluators’ judgments as the standard.

The results showed that the fine-tuned BERT model performed far better than ChatGPT in the zero-shot setting; BERT achieved an F1 score of 0.48, while ChatGPT only reached 0.22. Even methods based on LIWC surpassed ChatGPT (zero-shot) in F1 score, reaching 0.27. ChatGPT exhibited extremely extreme behavior in predicting moral sentiments, while BERT showed no significant differences from trained human annotators in almost all cases.

Although LIWC is a smaller, less complex, and less costly model, its likelihood and extremity of deviating from trained human annotators are significantly lower than those of ChatGPT. As expected, few-shot learning and fine-tuning both improved ChatGPT’s performance in the experiment. We draw two conclusions. First, the cross-contextual and flexibility advantages claimed by LLMs may not always hold. Second, although LLMs are very convenient as “plug-and-play,” they may sometimes fail completely, and appropriate fine-tuning can mitigate these issues.

▷ Jean-Michel Bihorel

In addition to inconsistencies in text annotation, inadequacies in explaining complex concepts (such as implicit hate speech), and possible lack of depth in specialized or sensitive domains, the lack of interpretability is also a much-criticized aspect of LLMs. As powerful language analysis tools, LLMs derive their extensive functions from massive parameter sets, training data, and training processes. However, this increase in flexibility and performance comes at the cost of reduced interpretability and reproducibility. The so-called stronger predictive power of LLMs is an important reason why researchers in psychological text analysis tend to use neural network–based models. But if they cannot significantly surpass top-down methods, the advantages in interpretability of the latter may prompt psychologists and other social scientists to turn to more traditional models.

Overall, in many application scenarios, smaller (fine-tuned) models can be more powerful and less biased than current large (generative) language models. This is especially true when large language models are used in zero-shot and few-shot settings. For example, when exploring the language of online support forums for anxiety patients, researchers using smaller, specialized language models may be able to discover subtle details and specific language patterns directly related to the research field (e.g., worries, tolerance of uncertainty). This targeted approach can provide deeper insights into the experiences of anxiety patients, revealing their unique challenges and potential interventions. By leveraging specialized language models or top-down methods like CCR and LIWC, researchers can strike a balance between breadth and depth, enabling a more nuanced exploration of text data.

Nevertheless, as text analysis tools, LLMs may still perform valuable functions in cases where fine-tuning data is scarce, such as emerging concepts or under-researched groups. Their zero-shot capabilities enable researchers to explore pressing research topics. In these cases, adopting few-shot prompting methods may be both effective and efficient, as they require only a small number of representative examples.

Moreover, studies have shown that LLMs can benefit from theory-driven methods. Based on this finding, developing techniques that combine the advantages of both approaches is a promising direction for future research. With the rapid advancement of large language model technology, solving performance and bias issues is only a matter of time, and it is expected that these challenges will be effectively alleviated in the near future.
 

3. Reproducibility Cannot Be Ignored

Reproducibility refers to the ability to replicate and verify results using the same data and methods. However, the black-box nature of LLMs makes related research findings difficult to reproduce. For studies that rely on data or analyses generated by LLMs, this limitation poses a significant obstacle to achieving reproducibility.

For example, after an LLM is updated, its preferences may change, potentially affecting the effectiveness of previously established “best practices” and “debiasing strategies.” Currently, ChatGPT and other closed-source models do not provide their older versions, which limits researchers’ ability to reproduce results using models from specific points in time. For instance, once the “gpt-3.5-January-2023” version is updated, its parameters and generated outputs may change, challenging the rigor of scientific research. Importantly, new versions do not guarantee the same or better performance on all tasks. For example, GPT-3.5 and GPT-4 have been reported to produce inconsistent results on various text analysis tasks—GPT-4 sometimes performs worse than GPT-3.5 [6]—which further deepens concerns about non-transparent changes in the models.

Beyond considering the black-box nature of LLMs from the perspective of open science, researchers are more concerned with the scientific spirit of “knowing what it is and why it is so.” When obtaining high-quality and informative semantic representations, we should focus more on the algorithms used to generate these outputs rather than the outputs themselves. In the past, one of the main advantages of computational models was that they allowed us to “peek inside”; certain psychological processes that are difficult to test can be inferred through models. Therefore, using proprietary LLMs that do not provide this level of access may hinder researchers in psychology and other fields from benefiting from the latest advances in computational science.

 

▷ Stuart McReath
 

4. Conclusion

The new generation of online service-oriented large language models (LLMs) developed for the general public—such as ChatGPT, Gemini, and Claude—provides many researchers with tools that are both powerful and easy to use. However, as these tools become more popular and user-friendly, researchers have a responsibility to maintain a clear understanding of both the capabilities and limitations of these models. Particularly in certain tasks, the excellent performance and high interactivity of LLMs may lead people to mistakenly believe that they are always the best choice as research subjects or automated text analysis assistants. Such misconceptions can oversimplify people’s understanding of these complex tools and result in unwise decisions. For example, avoiding necessary fine-tuning for the sake of convenience or due to a lack of understanding may prevent full utilization of their capabilities, ultimately leading to relatively poor outcomes. Additionally, it may cause researchers to overlook unique challenges related to transparency and reproducibility.

We also need to recognize that many advantages attributed to LLMs exist in other models as well. For instance, BERT or open-source LLMs can be accessed via APIs, providing researchers who cannot self-host these technologies with a convenient and low-cost option. This enables these models to be widely used without requiring extensive coding or technical expertise. Additionally, OpenAI offers embedding models like “text-embedding-ada-003,” which can be used for downstream tasks similar to BERT.

Ultimately, the responsible use of any computational tool requires us to fully understand its capabilities and carefully consider whether it is the most suitable method for the current task. This balanced approach ensures that technological advances are utilized effectively and responsibly in research.

 

]]>
https://admin.next-question.com/science-news/llm-kills-psychology/feed/ 0
How to Bridge the Gap Between Artificial Intelligence and Human Intelligence? https://admin.next-question.com/features/bridge-the-gap/ https://admin.next-question.com/features/bridge-the-gap/#respond Wed, 30 Oct 2024 07:51:46 +0000 https://admin.next-question.com/?p=2545

1.Artificial Intelligence vs. Human Intelligence

1.1 How did early artificial intelligence models draw inspiration from our understanding of the brain?

The early development of artificial intelligence was greatly influenced by our understanding of the human brain. In the mid-20th century, advancements in neuroscience and initial insights into brain function led scientists to attempt applying these biological concepts to the development of machine intelligence.

In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts introduced the “McCulloch-Pitts neuron model,” one of the earliest attempts in this area. This model described the activity of neurons using mathematical logic, which, although simplistic, has laid the groundwork for later artificial neural networks.

▷ Fig.1: Structure of a neuron and the McCulloch-Pitts neuron model.

During this period, brain research primarily focused on how neurons process information and interact within complex networks via electrical signals. These studies inspired early AI researchers to design primitive artificial neural networks.

In the 1950s, the perceptron, invented by Frank Rosenblatt, was an algorithm inspired by biological visual systems, simulating how the retina processes information by receiving light. Although it was rudimentary, it marked a significant step forward in the field of machine learning.

 

▷ Fig.2: Left: Rosenblatt’s physical perceptron; Right: Structure of the perceptron system.

In addition to the influence of neuroscience, early cognitive psychology research also contributed to the development of AI. Cognitive psychologists sought to understand how humans perceive, remember, think, and solve problems, providing a methodological foundation for simulating human intelligent behavior in AI. For instance, the logic theorist developed by Allen Newell and Herbert A. Simon could prove mathematical theorems [1-3], simulating the human problem-solving process and, to some extent, mimicking the logical reasoning involved in human thought.

Although these early models were simple, their development and design were profoundly shaped by contemporary understandings of the brain, which established a theoretical and practical foundation for the development of more complex systems. Through such explorations, scientists gradually built intelligent systems capable of mimicking or even surpassing human performance in specific tasks, driving the evolution and innovation of artificial intelligence technology.

1.2 Development of Artificial Intelligence

Since then, the field of artificial intelligence has experienced cycles of “winters” and “revivals.” In the 1970s and 1980s, improvements in computational power and innovations in algorithms, such as the introduction of the backpropagation algorithm, made it possible to train deeper neural networks. During this period, although artificial intelligence achieved commercial success in certain areas, such as expert systems, limitations that arose from technology and overly high expectations ultimately led to the first AI winter.

Entering the 21st century, especially after 2010, the field of artificial intelligence has witnessed unprecedented advancements. The exponential growth of data, the proliferation of high-performance computing resources like GPUs, and further optimization of algorithms propelled deep learning technologies as the main driving force behind AI development.

The core of deep learning remains the simulation of how brain neurons process information, and its applications have far surpassed initial expectations, encompassing numerous fields such as image recognition, natural language processing, autonomous vehicles, and medical diagnostics. These groundbreaking advancements have not only driven technological progress but also fostered the emergence of new business models and rapid industry development.

▷ Giordano Poloni

1.3 Differences Between Artificial Intelligence and Human Intelligence

1.3.1 Differences in Functional Performance

Although artificial intelligence has surpassed human capabilities in specific domains such as board games and certain image and speech recognition tasks, it generally lacks cross-domain adaptability. While some AI systems, particularly deep learning models, excel in big data environments, they typically require vast amounts of labeled data for training, and their transfer learning abilities are limited when tasks or environments change, often necessitating the design of specific algorithms. In contrast, the human brain possesses a robust learning and adaptation capacity, which is able to learn new tasks with minimal data across various conditions and perform transfer learning, applying knowledge gained in one domain to seemingly unrelated areas.

In terms of flexibility in addressing complex problems, AI performs best with well-defined and structured issues, such as board games and language translation. However, its efficiency drops when dealing with ambiguous and unstructured problems, which makes it susceptible to interference. The human brain exhibited high flexibility and efficiency in processing vague and complex environmental information; for instance, it can recognize sounds in noisy environments and make decisions despite incomplete information.

Regarding consciousness and cognition, current AI systems lack true awareness and emotions. Their “decisions” are purely algorithmic outputs based on data, devoid of subjective experience or emotional involvement. Humans, on the other hand, not only process information but also possess consciousness, emotions, and subjective experiences, which are essential components of human intelligence.

In multitasking, while some AI systems can handle multiple tasks simultaneously, this often requires complex, targeted designs. Most AI systems are typically designed for single tasks, and their efficiency and effectiveness in multitasking typically do not match those of the human brain, which can quickly switch between tasks while maintaining high efficiency.

In terms of energy consumption and efficiency, advanced AI systems, especially large machine learning models, often demand significant computational resources and energy, far exceeding that of the human brain. The brain operates on about 20 watts, showcasing exceptionally high information processing efficiency.

Overall, while artificial intelligence has demonstrated remarkable performance in specific areas, it still cannot fully replicate the human brain, particularly in flexibility, learning efficiency, and multitasking. Future AI research may continue to narrow these gaps, but the complexity and efficiency of the human brain remain benchmarks that are difficult to surpass.

▷ Spooky Pooka ltd

1.3.2 Differences in Underlying Mechanisms

In terms of structure, modern AI systems, especially neural networks, are inspired by biological neural networks, yet the “neurons” (typically computational units) and their interconnections rely on numerical simulations. The connections and processing in these artificial neural networks are usually pre-set and static, lacking the dynamic plasticity of biological neural networks. The human brain comprises approximately 86 billion neurons, each connected to thousands to tens of thousands of other neurons via synapses [6-8], supporting complex parallel processing and highly dynamic information exchange.

Regarding signal transmission, AI systems transmit signals through numerical calculations. For instance, in neural networks, the output of a neuron is a function of the weighted sum of its inputs, processed using simple mathematical functions such as Sigmoid or ReLU. Neural signal transmission relies on electrochemical processes, where information exchange between neurons occurs through the release of neurotransmitters at synapses, regulated by various biochemical processes.

In terms of learning mechanisms, AI learning typically adjusts parameters (such as weights) through algorithms, such as backpropagation. Although this method is technically effective, it requires substantial amounts of data and necessitates retraining or significant adjustment of model parameters for new datasets, highlighting a gap compared to the brain’s continuous and unsupervised learning approach. Learning in the human brain relies on synaptic plasticity, where the strength of neural connections changes based on experience and activity, supporting ongoing learning and memory formation.

1.4 Background and Definition of the Long-Term Goal of Simulating Human Intelligence—Artificial General Intelligence

The concept of Artificial General Intelligence (AGI) arose from recognizing the limitations of narrow artificial intelligence (AI). Narrow AI typically focuses on solving specific, well-defined problems, such as board games or language translation, but lacks the flexibility to adapt across tasks and domains. As technology advances and our understanding of human intelligence deepens, scientists begin to envision an intelligent system with human-like cognitive abilities, self-awareness, creativity, and logical reasoning across multiple domains.

AGI aims to create an intelligent system capable of understanding and solving problems across various fields, with the ability to learn and adapt independently. This system would not merely serve as a tool; instead, it would participate as an intelligent entity in human socio-economic and cultural activities. The proposal of AGI represents the ideal state of AI development, aspiring to achieve and surpass human intelligence in comprehensiveness and flexibility.

 

2. Pathways to Achieving Artificial General Intelligence

Diverse neuron simulations and network structures exhibit varying levels of complexity. Neurons with richer dynamic descriptions possess higher internal complexity, while networks with wider and deeper connections exhibit greater external complexity. From the perspective of complexity, there are currently two promising pathways to achieve Artificial General Intelligence: one is the external complexity large model approach, which involves increasing the width and depth of the model; the other is the internal complexity small model approach, which entails adding ion channels to the model or transforming it into a multi-compartment model.

▷ Fig.3: Internal and external complexity of neurons and networks.

 

2.1 External Complexity Large Model Approach

In the field of artificial intelligence (AI), researchers increasingly rely on the development of large AI models to tackle broader and more complex problems. These models typically feature deeper, larger, and wider network structures, known as the “external complexity large model approach.” The core of this method lies in enhancing the model’s ability to process information (especially when dealing with large data sets) and learn by scaling up the model.

2.1.1 Applications of Large Language Models

Large language models, such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), are currently hot topics in AI research. These models learn from vast text data using deep neural networks, master the deep semantics and structures of language, and demonstrate exceptional performance in various language processing tasks. For instance, GPT-3, trained on a massive text dataset, not only generates high-quality text but also performs tasks like question answering, summarization, and translation.

The primary applications of these large language models include natural language understanding, text generation, and sentiment analysis, making them widely applicable in fields such as search engines, social media analysis, and automated customer service.

2.1.2 Why Expand Model Scale?

According to research by Jason Wei, Yi Tay, William Fedus, and others in “Emergent Abilities of Large Language Models,” as the model size increases, a phenomenon of “emergence” occurs, where certain previously latent capabilities suddenly become apparent. This is due to the model’s ability to learn deeper patterns and associations by processing more complex and diverse information.

For example, ultra-large language models can exhibit problem-solving capabilities for complex reasoning tasks and creative writing without specific targeted training. This phenomenon of “emergent intelligence” suggests that by increasing model size, a broader cognitive and processing capability closer to human intelligence can be achieved.

▷ Fig.4: The emergence of large language models.

2.1.3 Challenges

Despite the unprecedented capabilities brought by large models, they face significant challenges, particularly concerning efficiency and cost.

First, these models require substantial computational resources, including high-performance GPUs and extensive storage, which directly increases research and deployment costs. Second, the energy consumption of large models is increasingly concerning, affecting their sustainability and raising environmental issues. Additionally, training these models requires vast amounts of data input, which may lead to data privacy and security issues, especially when sensitive or personal information is involved. Finally, the complexity and opacity of large models may render their decision-making processes difficult to interpret, which could pose serious problems in fields like healthcare and law, where high transparency and interpretability are crucial.

2.2 Internal Complexity Small Model Approach

When discussing large language models, they leave a strong impression with their highly “human-like” output capabilities. Webb et al. examined the analogical reasoning abilities of ChatGPT [3] and found that it has emerged with zero-shot reasoning capabilities, enabling it to address a wide range of analogical reasoning problems without explicit training. Some believe that, if large language models (LLMs) like ChatGPT can indeed produce human-like responses to common psychological measures (such as judgments about actions, recognition of values, and views on social issues), they may eventually replace human subject groups in the future.

2.2.1 Theoretical Foundations

Neurons are the fundamental structural and functional units of the nervous system, primarily composed of cell bodies, axons, dendrites, and synapses. These components work together to receive, integrate, and transmit information. The following sections will introduce the theoretical foundations of neuron simulation, covering neuron models, the conduction of electrical signals in neuronal processes (dendrites and axons), synapses and synaptic plasticity models, and models with complex dendrites and ion channels.

▷ Fig.5: Structure of a neuron.

 

(1) Neuron Models

Ion Channels

Ion channels and pumps in neurons are crucial membrane proteins that regulate the transmission of neural electrical signals. They control the movement of ions across the cell membrane, thereby influencing the electrical activity and signal transmission of neurons. These structures ensure that neurons can maintain resting potentials and generate action potentials, forming the foundation of the nervous system’s function.

Ion channels are protein channels embedded in the cell membrane that regulate the passage of specific ions (such as sodium, potassium, calcium, and chloride). Various factors, including voltage changes, chemical signals, and mechanical stress, control the opening and closing of these ion channels, impacting the electrical activity of neurons.

▷ Fig.6: Ion channels and ion pumps for neurons.

Equivalent Circuit

The equivalent circuit model simulates the electrophysiological properties of neuronal membranes using circuit components, allowing complex biological electrical phenomena to be explained and analyzed within the framework of physics and engineering. This model typically includes three basic components: membrane capacitance, membrane resistance, and a power source.

The cell membrane of a neuron exhibits capacitive properties related to its phospholipid bilayer structure. The hydrophobic core of the lipid bilayer prevents the free passage of ions, resulting in high electrical insulation of the cell membrane. When the ion concentrations differ on either side of the cell membrane, especially under the regulation of ion pumps, charge separation occurs. Due to the insulating properties of the cell membrane, this charge separation creates an electric field that allows the membrane to store charge.

Capacitance elements are used to simulate this charge storage capability, with capacitance values depending on the membrane’s area and thickness. Membrane resistance is primarily regulated by the opening and closing of ion channels, directly affecting the rate of change of membrane potential and the cell’s response to current input. The power source represents the electrochemical potential difference caused by the ion concentration gradient across the membrane, which drives the maintenance of resting potential and the changes in action potential.

▷ Fig.7: Schematic diagram of the equivalent circuit.

Hodgkin-Huxley Model

Based on the idea of equivalent circuits, Alan Hodgkin and Andrew Huxley proposed the Hodgkin-Huxley (HH) model in the 1950s based on their experimental research on the giant axon of the squid. This model includes conductances for sodium (Na), potassium (K), and leak currents, representing the opening degree of each ion channel. The opening and closing of the ion channels in the model are further described by gating variables, which are voltage- and time-dependent (m, h, n). The equations of the HH model are as follows:

Where is the membrane potential, is the input current, ​, ​, and are the maximum conductances for potassium, sodium, and leak currents, respectively. ​, ​, and ​ are the equilibrium potentials for potassium, sodium, and leak currents, respectively. , and are variables associated with the states of ion channel gating for potassium and sodium currents.
The dynamics can be described by the following differential equations:

The and functions represent the rates of channel opening and closing, which were experimentally determined using the patch-clamp technique.

Leaky Integrate-and-Fire Model (LIF)

The Leaky Integrate-and-Fire model (LIF) is a commonly used mathematical model in neuroscience that simplifies the action potential of neurons. This model focuses on describing the temporal changes in membrane potential [4-5] while neglecting the complex ionic dynamics within biological neurons.

Scientists have found that when a continuous current input is applied to a neuron [6-7], the membrane potential rises until it reaches a certain threshold, leading to the firing of an action potential, after which the membrane potential rapidly resets and the process repeats. Although the LIF model does not describe the specific dynamics of ion channels, its high computational efficiency has led to its widespread application in neural network modeling and theoretical neuroscience research. Its basic equation is as follows:

Where is the membrane potential; is the resting membrane potential; is the input current; is the membrane resistance; and is the membrane time constant, reflecting the rate at which the membrane potential responds to input current. is the membrane capacitance.

In this model, when the membrane potential reaches a specific threshold value ​, the neuron fires an action potential (spike). Subsequently, the membrane potential is reset to a lower value ​ to simulate the actual process of neuronal firing.

(2) Conduction of Electrical Signals in Neuronal Processes (Cable Theory)

In the late 19th to early 20th centuries, scientists began to recognize that electrical signals in neurons could propagate through elongated neural fibers such as axons and dendrites. However, as the distance increases, signals tend to attenuate. Researchers needed a theoretical tool to explain the propagation of electrical signals in neural fibers, particularly the voltage changes over long distances.

In 1907, physicist Wilhelm Hermann proposed a simple theoretical framework that likened nerve fibers to cables to describe the diffusion process of electrical signals. This theory was later further developed in the mid-20th century by Hodgkin, Huxley, and others, who confirmed the critical role of ion flow in signal propagation through experimental measurements of neurons and established mathematical models related to cable theory.

The core idea of cable theory is to treat nerve fibers as segments of a cable, introducing electrical parameters such as resistance and capacitance to simulate the propagation of electrical signals (typically action potentials) within nerve fibers. Nerve fibers, such as axons and dendrites, are viewed as one-dimensional cables, with electrical signals propagating along the length of the fiber; membrane electrical activity is described through resistance and capacitance, with current conduction influenced by internal resistance and membrane leakage resistance; the signal gradually attenuates as it propagates through the fiber.

▷ Fig.8: Schematic diagram of cable theory.

The cable equation is:

Where ​ represents the membrane capacitance per unit area, reflecting the role of the neuronal membrane as a capacitor; is the radius of the nerve fiber, which affects the propagation range of the electrical signal; ​ is the resistivity of the nerve fiber’s axial cytoplasm, describing the ease of current propagation along the fiber; and ​ is the ionic current density, representing the flow of ion currents through the membrane.

The temporal change, governed by the term CM and ∂ V(x,t) / ∂ t​, reflects how the membrane potential changes over time; the spatial spread, represented by the term (α/ 2 ρL)*(∂^2 V(x,t)/ ∂^2 x*2), describes the gradual spread and attenuation of the signal along the nerve fiber, which is related to the fiber’s resistance and geometry. The term​ iion indicates the ionic current through the membrane, which controls the generation and recovery of the action potential. The opening of ion channels is fundamental to signal propagation.

 

(3) Multi-Compartment Model

In earlier neuron modeling, such as the HH model and cable theory, neurons were simplified to a point-like “single compartment,” only considering temporal changes in membrane potential while neglecting the spatial distribution of various parts of the neuron. These models are suitable for describing the mechanisms of action potential generation but fail to fully explain signal propagation in the complex morphological structures of neurons (such as dendrites and axons).

As neuroscience deepened its understanding of the complexity of neuronal structures, scientists recognized that voltage changes in different parts of the neuron can vary significantly, especially in neurons with long dendrites. Signal propagation in dendrites and axons is influenced not only by the spatial diffusion of electrical signals but also by structural complexity, resulting in different responses. Thus, a more refined model was needed to describe the spatial propagation of electrical signals in neurons, leading to the development of the multi-compartment model.

The core idea of the multi-compartment model is to divide the neuron’s dendrites, axons, and cell body into multiple interconnected compartments, with each compartment described using equations similar to those of cable theory to model the changes in transmembrane potential  over time and space. By connecting multiple compartments, the model simulates the complex propagation pathways of electrical signals within neurons and reflects the voltage differences between different compartments. This approach allows for precise description of electrical signal propagation in the complex morphology of neurons, particularly the attenuation and amplification of electrical signals on dendrites.

Specifically, neurons are divided into multiple compartments, each representing a portion of the neuron (such as dendrites, axons, or a segment of the cell body). Each compartment is represented by a circuit model, with resistance and capacitance used to describe the electrical properties of the membrane. The transmembrane potential is determined by factors such as current injection, diffusion, and leakage. Adjacent compartments are connected by resistors, and electrical signals propagate between compartments through these connections. The transmembrane potential Vi follows a differential equation similar to cable theory in the i-th compartment:

Where Ci is the membrane capacitance of compartment, Imem,i(t) is the membrane current, and Raxial,i is the axial resistance between compartments. These coupled equations describe how the signal propagates and attenuates within different compartments of the neuron.

In the multi-compartment model, certain compartments (such as the cell body or initial segment) can generate action potentials, while others (like dendrites or axons) primarily facilitate the propagation and attenuation of electrical signals. Signals are transmitted through connections between different compartments, with input signals in the dendritic region ultimately integrated at the cell body to trigger action potentials, which then propagate along the axon.

Compared to single-compartment models, the multi-compartment model can better reflect the complexity of neuronal morphological structures, particularly in the propagation of electrical signals within structures like dendrites and axons. Due to the coupling differential equations involving multiple compartments, the multi-compartment model often requires numerical methods (such as the Euler method or Runge-Kutta method) for solution.

2.2.2 Why Conduct Complex Dynamic Simulations of Biological Neurons?

Research by Beniaguev et al. has shown that the complex dendritic structures and ion channels of different types of neurons in the brain enable a single neuron to possess extraordinary computational capabilities, comparable to those of a 5-8 layer deep learning network [8].

▷ Fig.9: A model of Layer 5 cortical pyramidal neurons with AMPA and NMDA synapses, accurately simulated using a Time Convolutional Network (TCN) with seven hidden layers, each containing 128 feature maps, and a historical duration of 153 milliseconds.

He et al. focused on the relationships between different internal dynamics and complexities of neuron models [9]. They proposed a method for converting external complexity into internal complexity, noting that models with richer internal dynamics exhibit certain computational advantages. Specifically, they theoretically demonstrated the equivalence of dynamic characteristics between the LIF model and the HH model, showing that an HH neuron can be dynamically equivalent to four time-varying parameter LIF neurons (tv-LIF) with specific connection structures.

▷ Fig.10: A method for converting from the tv-LIF model to the HH model.

Building on this, they experimentally validated the effectiveness and reliability of HH networks in handling complex tasks. They discovered that the computational efficiency of HH networks is significantly higher compared to simplified tv-LIF networks (s-LIF2HH). This finding demonstrates that converting external complexity into internal complexity can enhance the computational efficiency of deep learning models. It suggests that the internal complexity small model approach, inspired by the complex dynamics of biological neurons, holds promise for achieving more powerful and efficient AI systems.

▷ Fig.11: Computational resource analysis of the LIF model, HH model, and s-LIF2HH.

Moreover, due to structural and computational mechanism limitations, existing artificial neural networks differ greatly from real brains, making them unsuitable for directly understanding the mechanisms of real brain learning and perception tasks. Compared to artificial neural networks, neuron models with rich internal dynamics are closer to biological reality. They play a crucial role in understanding the learning processes of real brains and the mechanisms of human intelligence.

2.3 Challenges

Despite the impressive performance of the internal complexity small model approach, it faces a series of challenges. The electrophysiological activity of neurons is often described by complex nonlinear differential equations, making model analysis and solution quite challenging. Due to the nonlinear and discontinuous characteristics of neuron models, using traditional gradient descent methods for learning becomes complex and inefficient. Furthermore, increasing internal complexity, as seen in models like HH, reduces hardware parallelism and slows down information processing speed. This necessitates corresponding innovations and improvements in hardware.

To tackle these challenges, researchers have developed various improved learning algorithms. For example, approximate gradients are used to address discontinuous characteristics, while second-order optimization algorithms capture curvature information of the loss function more accurately to accelerate convergence. The introduction of distributed learning and parallel computing allows the training process of complex neuron networks to be conducted more efficiently on large-scale computational resources.

Additionally, bio-inspired learning mechanisms have garnered interest from some scholars. The learning processes of biological neurons differ significantly from current deep learning methods. For instance, biological neurons rely on synaptic plasticity for learning, which includes the strengthening and weakening of synaptic connections, known as long-term potentiation (LTP) and long-term depression (LTD). This mechanism is not only more efficient but also reduces the model’s dependence on continuous signal processing, thereby lowering the computational burden.

▷ MJ

3. Bridging the Gap Between Artificial Intelligence and Human Brain Intelligence

He et al. theoretically validated and simulated that smaller, internally complex networks can replicate the functions of larger, simpler networks. This approach not only maintains performance but also enhances computational efficiency, reducing memory usage by four times and doubling processing speed. This suggests that increasing internal complexity may be an effective way to improve AI performance and efficiency.

Zhu and Eshraghian commented on He et al.’s article, “Network Model with Internal Complexity Bridges Artificial Intelligence and Neuroscience” [5]. They noted, “The debate over internal and external complexity in AI remains unresolved, with both approaches likely to play a role in future advancements. By re-examining and deepening the connections between neuroscience and AI, we may discover new methods for constructing more efficient, powerful, and brain-like artificial intelligence systems.”

As we stand at the crossroads of AI development, the field faces a critical question: Can we achieve the next leap in AI capabilities by more precisely simulating the dynamics of biological neurons, or will we continue to advance with larger models and more powerful hardware? Zhu and Eshraghian suggest that the answer may lie in integrating both approaches, which will continuously optimize as our understanding of neuroscience deepens.

Although the introduction of biological neuron dynamics has enhanced AI capabilities to some extent, we are still far from achieving the technological level required to simulate human consciousness. First, the completeness of the theory remains insufficient. Our understanding of the nature of consciousness is lacking, and we have yet to develop a comprehensive theory capable of explaining and predicting conscious phenomena. Second, simulating consciousness may require high-performance computational frameworks that current hardware and algorithm efficiencies cannot yet support. Moreover, efficient training algorithms for brain models remain a challenge. The nonlinear behavior of complex neurons complicates model training, necessitating new optimization methods. Many complex brain functions, such as long-term memory retention, emotional processing, and creativity, still require in-depth exploration of their specific neural and molecular mechanisms. How to further simulate these behaviors and their molecular mechanisms in artificial neural networks remains an open question. Future research must make breakthroughs on these issues to truly advance the simulation of human consciousness and intelligence.

Interdisciplinary collaboration is crucial for simulating human consciousness and intelligence. Cooperative research across mathematics, neuroscience, cognitive science, philosophy, and computer science will deepen our understanding and simulation of human consciousness and intelligence. Only through collaboration among different disciplines can we form a more comprehensive theoretical framework and advance this highly challenging task.

]]>
https://admin.next-question.com/features/bridge-the-gap/feed/ 0
A Quiet Revolution: How Focused Ultrasound is Redefining Non-Invasive Brain Treatment https://admin.next-question.com/features/fus-brain/ https://admin.next-question.com/features/fus-brain/#respond Wed, 30 Oct 2024 07:23:58 +0000 https://admin.next-question.com/?p=2535 From the moment we are born, our brains continuously receive auditory information from the external world. Language, carried by sound waves, shapes our cognition, and music evokes aesthetic experiences in our minds. When the frequency surpasses the range detectable by the human ear, ultrasound waves can also affect the brain. Among the continuously advancing technologies in recent years is focused ultrasound. This technology is akin to using a convex lens to concentrate sunlight and ignite a fire; by focusing ultrasound waves on a specific point, it generates powerful energy, achieving therapeutic effects through a relatively non-invasive method. Its application in biomedicine, particularly in brain science, is initiating a revolutionary transformation.

 

1. Principles

(1) Physical Basis

The sounds we hear in our daily lives are all perceptible sound waves to humans, with frequencies ranging from 20 Hz to 20,000 Hz. Focused Ultrasound (FUS), however, utilizes ultrasound waves of much higher frequencies, far beyond the range of human hearing.

During the propagation of ultrasound waves, interference occurs, meaning the waves can mutually enhance or cancel each other out. By strategically arranging multiple ultrasound transducers, we can harness this interference to concentrate the energy of ultrasound waves at a specific focal point. This technique of focusing using ultrasound interference is known as FUS.

In an FUS system, each transducer can independently control the phase of the sound wave. By precisely calculating the phase of each transducer, a desired focal point can be produced. However, in practical applications, the shape and size of the focal point are also influenced by other factors, such as the propagation characteristics of the sound waves in different materials and the acoustic properties of the materials as they vary with temperature and frequency.

To overcome these issues, some FUS systems adopt “dual-mode ultrasound” technology. This involves the simultaneous use of a separate probe for ultrasound imaging while therapeutic ultrasound is being applied, allowing real-time monitoring of the focal point’s position and size and timely adjustment of focusing parameters to optimize therapeutic effects. This technology is currently used in the treatment of localized organ diseases such as the prostate.

The structural design of the ultrasound probe is also crucial for FUS. Different geometric configurations can produce different ultrasound beam shapes, making them suitable for various applications. In neurosurgery, tightly spaced, unidirectional transducer arrays are typically used, imaging along a straight path through small openings in the skull. Besides the geometric configuration of the transducer array, the parameters of the ultrasound waves themselves, such as frequency and amplitude, can also be adjusted. To avoid excessive heat generation, ultrasound is usually delivered in pulses, with the pulse repetition frequency and pulse duration also being adjustable.

 

(2) Biological Effects

When ultrasound waves penetrate biological tissues, they induce a series of complex physical processes, which can be broadly categorized into thermal and non-thermal effects.

The temperature rise induced by ultrasound in tissues primarily depends on the intensity of the sound waves and the tissue’s absorption properties. Generally, the higher the ultrasound frequency, the shallower the penetration depth but the higher the resolution. This means a balance must be struck between penetration depth, resolution, and frequency. When ultrasound generates heat within tissues, the tissue’s impedance and thermal conductivity properties affect the diffusion of heat. Physiological cooling mechanisms, such as blood perfusion and thermal diffusion, also play significant roles in the tissue heating process.

High-Intensity Focused Ultrasound (HIFU) can generate temperatures within tissues high enough to alter protein structures and coagulate tissues. It has been clinically applied to ablate kidney stones, tumors, and treat certain brain lesions causing movement disorders. In contrast, Low-Intensity Focused Ultrasound (LIFU) induces temperature changes within the normal physiological range, avoiding irreversible damage.

The non-thermal effects of FUS include mechanical forces, radiation forces, and some organ-specific effects, such as reversible opening of the blood-brain barrier and altering neuronal membrane potentials.

The mechanical effects of FUS are evident in its ability to directly act on certain mechanosensitive ion channels and proteins, including sodium and potassium channels, thereby altering the state of neurons. High-intensity ultrasound can also physically tear tissues, but its safety is challenging to assess.

Additionally, when ultrasound intensity is sufficiently high, cavitation occurs. Cavitation refers to the growth and collapse of microbubbles during the compression and expansion phases of sound waves. The threshold for cavitation depends on factors such as sound wave frequency, temperature, and pressure. Microbubble nuclei are needed to initiate cavitation, serving as the starting points for the growth, oscillation (stable cavitation), or violent collapse (inertial cavitation) of microbubbles. In HIFU, gases released due to thermal effects can become the primary source of these microbubble nuclei. Cavitation can influence cell membrane potentials and induce micro-streaming, forming turbulence that affects surrounding cells.

In summary, FUS can produce thermal and non-thermal effects in biological tissues, with distinct impacts. HIFU can raise tissue temperatures to 43-60 degrees Celsius, causing time-dependent damage and, at higher intensities, immediate tissue damage. This damage is mainly achieved through thermal and cavitation effects. With advancements in non-invasive temperature monitoring technology, MRI-assisted HIFU therapy has gained widespread application, allowing precise control of lesion size and ensuring safety.

Conversely, LIFU can induce reversible neurophysiological responses, such as increasing or decreasing neuronal firing rates and conduction velocity, inhibiting visual and somatosensory evoked potentials, EEG, and epileptic seizures. The exact mechanisms of LIFU remain uncertain and may involve thermal effects, mechanical effects, and changes in ion channel activity, warranting further research.

 

 Figure 1. Biological effects of FUS. Source: Meng, Ying, Kullervo Hynynen, and Nir Lipsman. “Applications of focused ultrasound in the brain: from thermoablation to drug delivery.” Nature Reviews Neurology 17.1 (2021): 7-22.

 

2. Technological Development

In 1935, Gruetzmacher designed a curved quartz plate that could focus ultrasound waves to a single point, leading to the birth of the first FUS transducer. Eight years later, Lynn and colleagues at Columbia University in the United States first reported the application of FUS in the brain during animal experiments. They discovered that by instantly raising HIFU to its maximum intensity, the effects at the focal point could be maximized while minimizing damage to nearby areas.

Despite the technological limitations at the time, these findings established HIFU as a feasible method for creating a precise focal point while reducing damage along the path. They also found that surface and along-the-path damage were inversely proportional to the distance from the focal point, suggesting that the technology might be more suitable for targeting deep brain areas. Additionally, using lower frequencies could reduce absorption and heating of surface tissues, favoring absorption at the focal point. They also found that focused ultrasound could create reversible nerve damage, with ganglion cells being more susceptible than glial cells and blood vessels. These findings laid the groundwork for the subsequent development of FUS by demonstrating its ability to produce safe, reversible effects in biological tissues.

Subsequently, William Fry and Francis Fry from the University of Illinois further advanced FUS technology. Early studies showed that focused ultrasound could damage surface tissues like the scalp and skull, affecting the focus. To address this issue, the Fry brothers decided to apply focused ultrasound directly to the dura mater through craniotomy.

In 1954, the Fry research team published a seminal paper describing their method of targeting deep brain structures using a device with four focused ultrasound beams (see Figure 2). This device could be used in conjunction with stereotactic equipment, demonstrating for the first time the effectiveness of combining focused ultrasound with stereotactic techniques in animal models. They successfully ablated the thalamus and internal capsule in 31 cats, with histological examination showing cellular changes in the target area within two hours of exposure. Unlike Lynn’s findings, this experiment primarily damaged nerve fibers while leaving the neuronal cell bodies in the target area largely unaffected. Additionally, there was no significant damage to blood vessels and surrounding tissues.

 

 Figure 2. The four-beam focused ultrasound device used by the Frys. Source: Harary, Maya, et al. “Focused ultrasound in neurosurgery: a historical perspective.” Neurosurgical Focus 44.2 (2018): E2.

Meanwhile, the Fry team used precise focused ultrasound stimulation of the lateral geniculate body to temporarily suppress the brain’s response to retinal flash stimuli. Specifically, electrodes were placed on the visual cortex to measure the brain’s electrophysiological response to light stimuli. During focused ultrasound exposure, the amplitude of these evoked potentials decreased to less than one-third of the baseline value. Surprisingly, once the ultrasound stimulation ceased, these electrophysiological indicators returned to their original levels within 30 minutes. More importantly, this dose of focused ultrasound did not cause any observable histological damage to the underlying neural tissue. This discovery pioneered a new concept: FUS neuromodulation.

After achieving success in animal experiments, the Fry laboratory collaborated with the Neurosurgery Department at the University of Iowa to apply FUS in human neurosurgery. They targeted deep brain regions in patients with Parkinson’s disease, attempting to treat tremors and rigidity with FUS. In 1960, Meyers and Fry published a treatment study involving 48 patients, demonstrating the therapeutic effects of FUS on Parkinsonian tremors and rigidity.

By the latter half of the 20th century, the therapeutic potential of FUS had gradually gained recognition. However, to avoid damage and distortion to surface tissues when passing through the skull, craniotomy was necessary, making the procedure still invasive. For FUS to further advance, two critical issues needed to be addressed: transcranial focusing and real-time monitoring.

To achieve transcranial focusing, FUS had to overcome two major obstacles: local overheating of the skull and beam propagation distortion due to tissue inhomogeneity. Bone absorbs ultrasound waves 30-60 times more than soft tissue. Early experiments found that the interaction between ultrasound waves and the skull led to rapid local heating of the skull, limiting the safe level of energy that could be applied. This issue was eventually resolved by using low-frequency hemispherical transducers and actively cooling the scalp. Low frequencies reduced surface absorption, while hemispherical transducers distributed local heating over a larger surface area, and scalp cooling prevented excessive heating.

Additionally, beam propagation and focusing were significantly distorted due to the acoustic impedance mismatch between bone and brain, as well as individual variations in skull shape, thickness, and the ratio of cortical bone to marrow. Until the early 1990s, this problem remained unresolved. The emergence of phased array technology, which corrects for delays and changes encountered during wave propagation by applying different phase shifts to each element, finally enabled precise targeting previously achievable only through craniotomy. Coupled with acoustic feedback techniques to accurately measure phase shifts caused by the human skull, focused ultrasound technology overcame this critical obstacle. These groundbreaking advancements laid the foundation for the modern development of focused ultrasound, making completely non-invasive treatment of deep brain structures possible.

Early applications of HIFU thermal ablation were primarily by surgeons for treating prostate, urinary system, breast, and gynecological tumors. In these applications, physicians could use diagnostic ultrasound to guide and monitor the treatment process in real-time. However, in neurosurgical applications, the skull impeded ultrasound imaging of internal tissue changes. In the late 1980s and early 1990s, Dr. Jolesz’s team pioneered the use of intraoperative magnetic resonance imaging (MRI) to address this issue. Subsequently, they turned their attention to using MR thermometry to monitor temperature changes within the brain in real-time during focused ultrasound treatment.

By the late 1990s, the Jolesz team discovered that low-power FUS could raise the temperature of the target area to 40-42 degrees Celsius without causing damage. This sub-threshold ultrasound exposure generated a thermal signal that could be located and targeted using MR thermometry, preparing for subsequent high-power ablative exposure. In the following years, Jolesz and colleagues focused on characterizing the thermodynamics, ultimately achieving the prediction of lesion size after continuous exposure and real-time monitoring of the thermal damage process.

 

3. Clinical Applications

(1) HIFU-based Thermal Ablation Therapy

HIFU can produce therapeutic effects by raising the temperature of the target tissue. When the temperature increases to 40-45 degrees Celsius, it can enhance the sensitivity of tumors to radiotherapy or aid in the release of drugs from thermosensitive liposomes. When the temperature exceeds 56 degrees Celsius, it causes tissue denaturation and necrosis.

For common tremor disorders, such as essential tremor, focused ultrasound can target and ablate key areas like the ventral intermediate nucleus (VIM) of the thalamus or the cerebellar-thalamic tract (CTT), effectively alleviating patients’ tremor symptoms. Numerous clinical studies have confirmed that unilateral FUS surgery targeting the VIM or CTT significantly improves patients’ tremor and quality of life, with most adverse effects, such as sensory disturbances and gait abnormalities, being temporary.

In Parkinson’s disease, focused ultrasound also offers multiple target options. For patients primarily exhibiting tremors, ablation of the VIM can be chosen; for motor disorders, the subthalamic nucleus (STN) or globus pallidus internus (GPi) can be targeted; for motor complications, the pallidothalamic tract (PTT) can be targeted. These FUS surgeries effectively improve motor symptoms in Parkinson’s patients, although adverse effects such as speech disorders may occur.

Additionally, focused ultrasound has applications in treating psychiatric disorders like obsessive-compulsive disorder (OCD) and depression. By ablating the anterior limb of the internal capsule (ALIC), it can effectively alleviate symptoms such as obsessive thoughts, depression, and anxiety without causing cognitive decline.

 

 Figure 3. Applications of FUS in the human brain. Source: Meng, Ying, Kullervo Hynynen, and Nir Lipsman. “Applications of focused ultrasound in the brain: from thermoablation to drug delivery.” Nature Reviews Neurology 17.1 (2021): 7-22.

 

(2) Opening the Blood-Brain Barrier

The blood-brain barrier (BBB) is a barrier formed by the walls of brain capillaries, glial cells, and the choroid plexus. Its main function is to regulate the entry and exit of substances into the brain, maintaining the brain’s stable environment. Although the BBB blocks harmful substances from entering the brain, it also hinders the entry of drugs, especially large-molecule drugs, for disease treatment.

Research has found that low-intensity focused ultrasound (LIFU) can safely and reversibly open the BBB. After injecting microbubbles, ultrasound causes these microbubbles to oscillate, temporarily disrupting the tight junctions of the BBB and allowing better drug penetration into the brain. This method has been shown in animal experiments to effectively enhance the treatment of neurological diseases such as brain tumors, Parkinson’s disease, and Alzheimer’s disease.

Furthermore, this technique of opening the BBB can non-invasively release biomarkers such as phosphorylated tau protein into the blood, aiding in the early diagnosis and monitoring of neurodegenerative diseases and brain tumors. It can also modulate the neuroimmune system to achieve therapeutic effects, such as reducing amyloid plaques and hyperphosphorylated tau protein in Alzheimer’s disease models, promoting adult neurogenesis, and altering the tumor microenvironment.

 

(3) LIFU-based Neuromodulation

In addition to opening the BBB, LIFU can precisely modulate neural activity in specific brain regions by altering the permeability of neuronal cell membranes and activating ion channels. Clinical studies have confirmed that focused ultrasound can modulate cortical functions in the human brain, inducing plastic changes. It can alter functional connectivity in the brain and affect neurochemical substances in the deep cortex. Some studies have shown that using a navigation system to precisely target specific brain regions can safely and effectively reduce the frequency of seizures in epilepsy patients, improve symptoms of neurodegenerative diseases, alleviate neuropathic pain, and reduce depression.

Compared to existing neuromodulation techniques, FUS has several potential advantages: unlike transcranial direct current stimulation (tDCS) and transcranial magnetic stimulation (TMS), FUS can target deep brain regions with millimeter-level spatial resolution. Compared to deep brain stimulation (DBS), FUS is less invasive, avoiding surgical risks and allowing for repeated treatments. By adjusting the position or direction of the transducer, multiple brain regions such as the hippocampus, prefrontal cortex, motor cortex, caudate nucleus, and substantia nigra can be stimulated.

 

 Figure 4. Research status of FUS therapy for brain diseases. Source: Focused Ultrasound Foundation. “State of the Field Report 2023 – Focused Ultrasound Foundation.” Focused Ultrasound Foundation, 20 Sept. 2023, www.fusfoundation.org/the-foundation/foundation-reports/state-of-the-field-report-2023.

 

4. Conclusion

The role of FUS in neuroscience and clinical therapy is increasingly prominent. By precisely controlling the focused energy of sound waves, FUS not only enables precise treatment of brain lesions but also shows great potential in neuromodulation and drug delivery. Its characteristics include high localization accuracy and relative non-invasiveness.

As of 2022, the FUS field has received $3.14 billion in R&D investments from government and industry, with 337 therapies approved by 39 regulatory agencies targeting 32 indications and treating a total of 565,210 cases. Currently, dozens of therapies are still under active development.

However, the development of FUS still faces a series of technical and clinical challenges. High-intensity FUS thermal ablation therapy is currently inefficient for large lesions and surrounding brain areas, and its application is limited in patients with lower skull density. Additionally, for lesions near the skull base, the surrounding sensitive neurovascular structures may be at risk. It is anticipated that in the coming years, optimized ultrasound focusing and correction technologies, as well as more personalized ultrasound transducer arrays, will emerge to minimize heating and expand the treatment range.

 

 Figure 5. Changes in the number of FUS-related publications over time. Source: Meng, Ying, Kullervo Hynynen, and Nir Lipsman. “Applications of focused ultrasound in the brain: from thermoablation to drug delivery.” Nature Reviews Neurology 17.1 (2021): 7-22.

Clinically, current research aims to improve the tolerability of FUS treatment, for example, by shortening surgery times and using neuroimaging-assisted devices. Additionally, exploring the application of FUS in new clinical indications, such as the treatment of brain lesions inducing epilepsy, is a future direction of development.

The future of FUS applications depends on interdisciplinary collaboration, involving joint efforts from medicine, physics, and neuroscience. Through in-depth research into its mechanisms and clinical applications, FUS holds immense potential to help us tackle some of the most challenging brain diseases faced by humanity.

]]>
https://admin.next-question.com/features/fus-brain/feed/ 0
What Will Future Life Forms Look Like? https://admin.next-question.com/features/what-will-future-life-forms-look-like/ https://admin.next-question.com/features/what-will-future-life-forms-look-like/#respond Mon, 30 Sep 2024 16:20:59 +0000 https://admin.next-question.com/?p=2467

Life is a complex and mysterious phenomenon. Feynman once said, “What I can’t create, I do not understand.” To truly understand life, the best approach might be to create it. The urge to create life transcends different civilizations and permeates human history. This is evident from the golems of ancient Jewish folklore, the Talos of ancient Greece, and Yanshi of the Zhou Dynasty in China, to literary works like Frankenstein and modern science fiction.Today, this urge is driven not only by the need for functionality or productivity but also by advancements in technology. Particularly in the fields of artificial life and artificial intelligence, it has made significant contributions to understanding ourselves and our existence in the universe.

Artificial Life (Alife) is an emerging interdisciplinary research field that aims to explore various potential forms of life through computational models and physical simulations. Alife seeks not only to simulate and replicate known forms of life on Earth but also to create entirely new, possible life forms, thereby expanding our understanding of the concept of “life.” This includes using computer simulations to model life processes, studying self-organizing systems, and developing software and hardware systems that can simulate natural selection and evolutionary processes. This article will review the history of Alife and introduce its latest research directions and practical advancements.

 

1.From Life to Alife

In traditional biology, life is typically considered a natural system composed of DNA and proteins, characterized by metabolism, reproduction, development, genetic evolution, and other traits. It wasn’t until 1944, when physicist Erwin Schrödinger published “What Is Life?”, that life was redefined beyond specific physiological structures as an energy-information coupled negentropic system. This concept not only addressed past theories but also resonated with contemporary thinking, bridging organicism and mechanism, and laying the theoretical groundwork for the study of Alife.

Around the same time, scientists from various fields began exploring Alife, which can be divided into three waves:

The First Wave: Exploration of Self-Replication (1950s-1960s)

During this period, scientists focused on the fundamental principles of self-replication, self-organization, and evolution, which sustain and generate the structure of life. Areas involved included cybernetics, Turing machines, morphogenesis, neural network models, and genetic algorithms. Notably, John von Neumann’s 1948 work on self-replicating automata laid the foundation for the discipline of artificial life.

In this wave, scientists viewed life as a form of logic, emphasizing abstract models of information processing. Practically, Norwegian mathematician Nils Barricelli used one-dimensional cellular automata on early computers (IAS) between 1953-1962 to simulate a digital world capable of indefinite evolution, even observing digital symbiotic life forms. He posited that Darwin’s theory of competitive evolution was insufficient to explain the entirety of life’s evolution, highlighting the significant roles of symbiosis and cooperation [1]. This wave clarified life’s fundamental characteristics through models and algorithms, dispelling myths about life’s origins and establishing the theoretical and practical foundation for artificial life.

The Second Wave: Computational Simulation Period (1970s-1990s)

During this period, scientists extensively utilized computational simulation technologies, including cellular automata, neural networks, and digital evolutionary systems, to study characteristics of life systems such as adaptability and emergence.

In September 1987, Christopher Langton organized the first Artificial Life Conference, marking the birth of the discipline. Alife was defined as the study of “life as it could be,” [2] in contrast to traditional biology’s focus on known life forms on Earth. Von Neumann’s cellular automata, further developed by John Conway (Game of Life), Stephen Wolfram, and Langton himself, became one of Alife’s most crucial theoretical models.

Langton defined a cellular automaton activity parameter λ, where an intermediate value of λ results in automata exhibiting both localized stable patterns and unorganized chaotic behavior, known as the “edge of chaos.” He suggested that life or intelligence originates from this “edge of chaos,” an important concept in complex systems theory. This wave advanced computational simulation techniques to study complex system characteristics like adaptability and emergence, marking the establishment of the Alife discipline and the beginning of engineering practice.

The functionalist stance and abstract model simulations of the first two waves helped capture the universal essential features of life. As Langton believed, life is an appropriate form independent of its substrate, meaning different materials can achieve the same life and intelligence functions [3]. Consequently, the main outcomes of these waves were software/virtual/digital artificial life (Soft Alife).

However, Alife research has always been accompanied by skepticism and reflection. Contemporary theoretical biologist Robert Rosen, in his 1991 work “Life Itself,” argued that life forms are “closed to efficient causation” (M,R) systems*[4], where the drivers of change must also be the products of the system. For example, enzymes in life systems catalyze metabolic reactions and produce themselves, creating a cyclical causal structure that grants life a certain autonomy and self-generative capability, implying that life forms cannot be entirely simulated by Turing machines.

(M,R) System: Metabolism (M) and Repair (R). Metabolism involves mechanisms (denoted as f) that transform materials A into products B, while repair involves mechanisms (denoted as Φ) that synthesize the metabolic mechanism f from the metabolic products B.

At the same time, biologist and cognitive scientist Francisco Varela also opposed Langton’s view [5]. He argued that the contextuality and historicity of organisms are irreducible and that the embodied interaction between organisms and their environment is more important than their underlying logic or functional patterns. These considerations led to the third wave of artificial life.

▷ Spectrum of Alife Research Methods: Abstract Models, Simulations, Digital Evolution, Biochemical Experiments, and Field Studies, Reflecting the Development from Abstract Models to Real-world Embedding, and the Soft, Hard, Wet, and Hybrid Paradigms. [6]

The Third Wave: Embedded Evolution Period (Late 1990s-Present)

This period’s research focuses more on the relationship between artificial life and real-world environments, emphasizing the embeddedness, embodiment, interaction, and emergence of life in the real world. In 1972 and 1991, Varela and collaborators introduced the concepts of autopoiesis and embodied mind, respectively, emphasizing the self-generative capacity of life systems and the dynamic, embedded emergent processes resulting from their embodied interactions with the environment. These ideas provided new theoretical perspectives for Alife research.

Researchers attempted to introduce “upward tension” within systems to avoid equilibrium states, construct self-referential evolution and dynamics [7], and incorporate the environment and the observer (e.g., attention and operation) into system function descriptions. Laboratory experiments aimed to drive systems to break through and generate innovative functions and behaviors within a self-organizing recursive loop, evolving toward broader directions in an ecosystem.

In addition to digital artificial life (Soft Alife), wetware (Wet Alife), hardware (Hard Alife), and hybrid artificial life (Hybrid Alife) became representative fields during this period. These included explorations in molecular-cellular level artificial chemistry, synthetic biology, self-replicating robots, and other areas, expanding research from artificial organisms to artificial ecosystems and societies. Even within digital artificial life research, scientists focused on digital embodied forms [8] and digital ecologies.

▷ EvoSphere: An autonomous evolutionary robotic system that continuously selects individuals capable of collecting nuclear waste in the environment. These individuals’ genotypes are then optimized through wireless “mating,” and new robot offspring are produced using 3D printing technology. [9]

Overall, the development of Alife can be divided into two axes: the horizontal axis represents technical implementation methods, from software computational simulations (Soft Alife) to physical hardware implementations (Hard Alife), biochemical experiments (Wet Alife), and bionic hybrid systems (Hybrid Alife). The vertical axis represents research focus, from artificial organisms to artificial ecosystems, open evolution, and symbiosis.


 

2. Artificial Ecosystems and Societies

In nature, organisms typically do not exist in isolation. For instance, many primates and social insects live in populations that coexist within ecosystems. Ecosystems, as the driving force behind the evolution of life, consist of the environment and all the organisms within it, forming dynamic systems filled with ecological interactions. Ecological research can be divided into community ecology, which focuses on interactions among biological populations, and ecosystem ecology, which examines interactions between organisms and their physical environment. Unlike simply creating artificial organisms, this branch of Alife studies collections of interacting artificial entities.

In natural ecological communities, different species interact with each other. In Alife, researchers often analyze community dynamics by viewing individuals as specific genotypes, phenotypes, or ecotypes. William Reiners [10] suggested that unified systems ecology research requires at least three independent and complementary theoretical frameworks: energetics, material science, and the evolution of population interactions or ecosystem “connectedness.” These frameworks can be understood respectively as energy flow, material flow, and causal information.

The flow and transformation of energy and materials include trophic levels, food chains and webs [11], productivity, and biogeochemical cycles (such as the carbon cycle). Population interactions or ecosystem connectedness encompass collective behaviors within populations [12], interactions among communities (such as predation, competition, and symbiosis), and niche studies exploring how these niches emerge, disappear, and evolve [13].

Theoretical techniques for studying the evolution of Alife artificial ecosystems mainly include complex dynamic systems, cellular automata, feedback networks, and cybernetics. Digital evolution simulation platforms provide tools for examining the dynamics of complex evolution and ecosystems.

 Common Alife Digital Evolution Simulation Platforms. [6]

Some simulation platforms set up energy-material flow simulations. For example, in Tierra, energy is defined as the CPU cycles (computer processing time) required to execute instructions. Agents consume CPU cycles to execute instructions, altering their local environment. Minimizing energy consumption (CPU usage) can improve replication efficiency, exerting selective pressure on agents to evolve towards more efficient and reliable directions. In the Avida platform, organisms compete by metabolizing different limited resources (energy).

Alife ecological research systems differ from natural systems in that researchers can set parameters far beyond those commonly found in nature [15]. This allows researchers to explore which characteristics are essential for life and which traits and events might be mere coincidences of Earth’s life forms. Thus, Alife can not only “replay the tape of life” but also design and play tapes that run very differently from those on Earth (such as on Mars), helping to clarify and extract the essential properties of life.

Artificial ecological construction can be divided into four forms: soft, hard, wet, and hybrid artificial ecosystems. Alife digital platforms naturally simulate soft artificial ecosystems, including artificial life software systems or games and digital ecological art based on evolutionary computation. Hard artificial life ecosystems mainly involve the interaction of machine cluster systems. Wet artificial life ecosystems attempt to construct ecological communities composed of organisms that typically do not coexist, such as communities performing ideal tasks like waste decomposition and carbon sequestration. This subfield is known as synthetic ecology. As for hybrid artificial ecosystems, the most famous is the Flora Robotica project [16], which created a symbiotic ecosystem of plants and robots, where robots control plant growth by emitting different colored lights at different locations.

 Software, Wetware, and Hardware Artificial Ecosystems.

Research and practice in the field of artificial ecosystems are extensive. For example, as complexity increases, ecosystems can promote social and cultural interactions [17]. This involves interactions of predation, cooperation, and culture, sparking studies on perception, communication, and language [18]. Moreover, in the field of artificial intelligence, research on multi-agent systems and artificial collective intelligence can also be attributed to the study of artificial ecosystems and societies [19].

 

3. Open-Ended Evolution

“Nothing in biology makes sense except in the light of evolution.”

— Theodosius Dobzhansky, American evolutionary biologist

Driven by differential selection, populations and ecosystems that reproduce, inherit, and vary undergo Darwinian evolution. Evolutionary algorithms [20] use similar concepts to search for the optimal values of fitness functions in the external environment, often leading to unexpected results [21].

However, natural evolution is much more complex than artificial simulations at multiple levels, including genomic complexity, population size, number of generations, reproductive strategies (from horizontal gene transfer to sexual selection), the roles of gene regulation, development, and epigenetics, interactions among multiple co-evolving populations, and intrinsic definitions of fitness. These more natural evolutionary characteristics constitute the core content of Alife evolution research [14][22].

As Darwin said, “Endless forms most beautiful,” [23] Open-Ended Evolution (OEE) [24] is an important research area in Alife. OEE not only captures the fundamental characteristics of existing life systems, including their interactions with the environment, but also provides a framework to explore and simulate the possibility space of potential life forms. This can inspire various optimization, learning, and evolutionary algorithms during the research process. Thus, achieving OEE can be considered the ultimate goal of artificial life.

Unlike general evolution, an open-ended evolution system never settles into a single stable equilibrium, continuously generating novelty [25] and infinitely increasing complexity. This indicates that open-ended evolution represents a multi-scale evolutionary process [26]. The field of OEE involves questions about the origin of life [27], major evolutionary transitions in the emergence of complexity and organizational levels [28], and meta-evolution, which is the evolution of evolutionary capabilities and their evolution [29].

Several hypothetical conditions are necessary to produce open-ended evolution [30]:

 

(1) Infinite Genetic Space of Potential Genotypes: This does not mean the genome length of organisms must change indefinitely. Due to the slow mutation and inheritance rates of dominant genes, there may be a large number of regulatory genes for non-coding proteins, previously termed “junk DNA,” that quickly generate diversity by regulating the former. This implies that merely changing regulatory switches, similar to weights in neural networks, can quickly form new species. This hypothesis helps explain the common genes of the Last Universal Common Ancestor (LUCA) and the Cambrian explosion of life.

(2) Multiple Mutation Pathways Between Potential Phenotypes: This means that the potential characteristics of organisms should be producible by many different mutation pathways. For example, the eyes of humans and octopuses, though functionally similar, evolved independently; the fins of fish and dolphins are examples of convergent evolution. Theoretical biologist Stuart Kauffman illustrated this through random Boolean networks, finding that the ultimate behavior of gene regulatory networks is determined by the number of nodes and the in-degree 𝐾 of each node. When 𝐾=2, the network enters a state of edge-of-chaos, neither fixed-point, oscillating, nor completely chaotic. Each attractor in the network corresponds to a gene expression of a cell, indicating that different genomes can achieve the same adaptive phenotype through various evolutionary paths.

(3) Dynamic Adaptive Landscape: This means the environment surrounding organisms is constantly changing. On the one hand, as populations evolve, their environment changes with their behavior. On the other hand, the dynamic environment continuously selects the population, affecting genotype and phenotype realization through mechanisms like epigenetics.

 

Open-ended evolution can be studied using cellular automata, digital evolution simulation platforms like Tierra, Polyworld, Avida, and artificial chemistry [31] simulation platforms like Stringmol [32]. A 2019 study [33] identified three types of novelty and their corresponding openness based on the relationship between phenotypic behaviors and the search space of evolution:

 

Exploratory Openness: Novelty describable by current models, usually involving recombinations of existing components or modifications of existing parameter values. For example, new allele combinations in a genome might determine the number of vertebrae in new vertebrates.

Expansive Openness: Novelty requiring changes to the model but still using concepts existing within the current meta-model, involving discovering states that open new neighborhoods in the state space. This often involves utilizing previously untapped chemical or physical laws or the emergence of new boundary conditions or mechanisms. For example, the emergence of flight wings or visual sensory systems.

Transformational Openness: Novelty introducing new concepts, requiring changes to the meta-model. This involves not only new physical laws but also transitions at organizational levels. For instance, synthesizing a new chemical substance previously unused in metabolic reactions. The origin of life, eukaryotic cells, multicellular organisms, brains, self-awareness, writing, and technology belong to this category, often corresponding to strong emergence in complex systems.

 

 Three Types of Openness: Adaptation of Darwin’s finches to ecological niches; emergence of wings; eukaryotic cells.

Various metrics are used to detect [34] or quantify the potential for openness, such as the Measures of Open-Ended Dynamics (MODES) metric [35]. The effectiveness and applicability of these metrics have been verified in different experimental environments like the NK model and Avida digital evolution platform.

 Relationships Between MODES Indicators.

In MODES, the potential for complexity requires organisms to continuously integrate more environmental information into their genomes, generating increasingly complex behaviors. Ecological potential describes the ability of a population as a whole to absorb and reflect environmental information, including creating new niches and trophic levels through interactions with biotic and abiotic environments.

To achieve sustained open-ended change, environmental and ecological factors must be integrated into the evolution of organisms. Thus, open-ended evolution systems can be viewed as Rosen’s (M,R) systems. According to the description by the large language model ChatGPT, it prefers exploratory openness. Research in open-ended evolution is bound to provide important methods and paradigms for computational creativity and AGI research.

 

4. Symbiosis

“Nothing in evolution makes sense except in the light of parasitism,” states the title of a 2021 paper [36], echoing Dobzhansky’s famous description of evolution. This might not be an exaggeration. Numerous studies indicate that the evolution of the parasitism-mutualism continuum [37] is a crucial mechanism for generating biological novelty, shaping ecological diversity, and driving significant transitions in Earth’s life. As mentioned earlier, Nils Barricelli’s early Alife simulations recognized this phenomenon. Nick Lane, in “The Vital Question,” also argues that endosymbiosis increased the energy efficiency of life, facilitating the origin of eukaryotes and their eventual evolution into complex life on Earth.

Symbiosis can be seen as the result of intimate co-evolution at various ecological scales among organisms and populations [38]. Depending on spatial relationships, symbiosis can be divided into ectosymbiosis and endosymbiosis. Based on interspecies benefit relationships, symbiosis usually includes mutualism, commensalism, neutral interactions, amensalism, competition, and parasitism [39]/predation (broadly defined) [40]. Resource exchange-utilization dependency cycles in ecosystems can lead to community building and symbiotic evolution, even forming cross-feeding phenomena where the metabolic products of one species become the resources for another [41].

 Symbiotic Relationships [42]. The author combines figures from https://en.wikipedia.org/wiki/Symbiosis and [40] Figure 1.1.

Symbiosis can also be studied using the Game of Life and other digital evolution simulation platforms. For instance, researchers developed the Model-S [43] based on Conway’s Game of Life, which successfully simulated the evolution of self-organization, autopoiesis, multicellularity, the emergence of sexual reproduction, and species symbiosis fusion strategies. Studies found that even a small amount of symbiosis could significantly increase population fitness and potentially support open-ended evolution. In digital evolution systems supporting symbiosis [44], digital organisms survive and develop by consuming “CPU time” resources. Endosymbionts consume resources within the host, similar to mitochondria in real cells. Similarly, digital ribosomes transform digital genomes (binary code) into outputs or behaviors, akin to ribosomes in biological cells that read genetic codes and synthesize proteins.

 Model-S Simulation of the Migration Game Producing a “Symbiotic Layer” [43] and the Simulation of Endosymbiosis in Digital Evolution Systems [44].

Life can be said to have been a symbiotic phenomenon from its origin, spanning different types of replicators from genes to culture (memes) to technology as extended phenotypes (techmemes). In the future, the fusion of biological and digital might enter a new era of human-machine symbiosis [45].

If the substrate of biological evolution is the natural physical environment, what is the substrate and ecological environment for the evolution of technobiological entities? Clearly, this must be a system interoperable by both human and machine life forms. Some studies suggest that [46] blockchain-based distributed systems possess characteristics such as environmental responsiveness, growth and change, genetic replication, and the ability to achieve homeostasis—many of which align with the definitions of life. When combined with AI technologies like neural networks, such systems could become self-organizing systems more advantageous than traditional life forms. Articles suggest that distributed virtual machines (dVMs) on public blockchains provide an open environment for autonomous evolution, supporting the development of artificial general intelligence (AGI).

 Evolution of Evolutionary Substrates [45]; DNA and Blockchain [46].

Recently, a viewpoint known as Digital Matter Theory has emerged in the blockchain technology field, proposing that digital information can be viewed as a form of digital matter, akin to physical substances like wood or metal. By mining inherent patterns in blockchain data, a new form of digital matter can be created, termed the blockchain element table. It is seen as a representation of the material substrate and assets in the digital world. As block height increases, this digital world grows unpredictably, similar to the material world. In such systems, non-arbitrary resources/tokens (non-arbitrary, meaning the total amount is not human-defined or arbitrarily issued) can be created, providing the underlying logic and foundation for autonomous worlds and the metaverse.

 Digital Matter Theory, Autonomous Worlds, and the Blockchain Life Simulation Game Cellula [55].

Moreover, there are now blockchain-based Alife projects. For example, the blockchain life simulation game Cellula allows players to create various gene sequences and nurture their own “life” on the blockchain. In this game, block height acts as “time,” and each “life” grows, evolves, and dies within the blockchain space-time ecosystem. Since blockchain Alife involves both players and blockchain agent entities, it represents a typical form of reciprocal symbiosis. Additionally, some projects explore the potential of symbiosis between non-human entities like plants and digital or machine entities using blockchain technology or physical environments, such as the terra0 and Flora Robotica projects.

 Examples of Cross-media and Cross-species Symbiosis.

Of course, a more broadly discussed form of symbiosis is between humans and artificial intelligence. This primarily involves task and collaboration research, human-AI value alignment [47], and particularly the AI theory of mind [48]. These studies largely reflect anthropocentric views and symbiotic design considerations that prioritize human interests.

 Dimensions of Human-Machine Symbiosis Methods [49].
 

5. Biologically Inspired AI and the Future of Artificial Life

Looking back, it is evident that AI was long constrained by the limitations of symbolic artificial intelligence, or GOFAIstic [50]. In contrast, Alife and biologically inspired methods, also known as cybernetics-inspired approaches, have provided crucial insights and paradigms for the development of AI.

Geoffrey Hinton’s pioneering work on deep learning networks essentially simulates human brain neurons, while large models like Transformers emulate human brain memory and learning mechanisms to some extent. The hippocampal memory model of the human brain is akin to a causal Transformer with RNN positional encodings [51]. OpenAI, the creator of ChatGPT, constantly reminds its employees of the “bitter lesson,” firmly believing in the scaling law for emergent properties. They argue that only by continually increasing data and computational power can AGI be achieved. Reinforcement learning, including Reinforcement Learning from Human Feedback (RLHF), draws from animal behavior psychology and is fundamentally a feedback-based environmental adaptation mechanism. Generative Adversarial Networks (GANs) derive from ecological mechanisms of competition and symbiosis in animals.

 GOFAI and Biologically Inspired (Cybernetic) AI [52].

With technological advancements, Alife researchers have continually explored various paradigms, from abstract models to computational simulations, embodied embedding, artificial ecosystems, and open-ended evolution, all driving further AI development. For example, Karl Friston’s Free Energy Principle and Yann LeCun’s Joint Embedding Predictive Architecture (I-JEPA) [53] attempt to integrate more biological principles into AI systems to enhance their interaction with the environment and autonomy. Even non-Von Neumann architectures, Hinton’s later proposal of mortal computation, the widely discussed human-AI value alignment, and ecologically inspired artificial intelligence [56] reflect the influence of Alife’s developmental trajectory.

Currently, generative AI, especially large language models, significantly impacts Alife research [54]. Large language models are used to explore the emergence of meaning, causal emergence, and open-ended evolution, particularly in human-machine collaboration, artificial collective intelligence, and multi-AI agent systems like the famous Stanford AI village. Additionally, there are algorithmic integrations such as neuroevolution, which combine neural networks and genetic algorithms. Researchers have used simple virtual organisms [57] with visual and auditory perceptions to interact with their environment, conspecifics, and predators, discovering that some individuals develop a “fear center” module to specifically respond to predators. This indicates that neural mechanisms akin to emotions and consciousness in biological brains can emerge in artificial substrates.

Finally, if we think in terms of the Alife paradigm, the questions before us are: How can blockchain-based systems and digital native substrates achieve autonomous (M,R) systems? How will large language models and AI agents evolve transformative openness? What kind of symbiotic relationship will AI and humans form—competition, endosymbiosis, or cross-species feeding? How will virtual reality and augmented reality impact future digital ecosystems?

Regardless of the future, Alife, AI, blockchain, and the metaverse (AR/VR/XR) will intertwine, leading to a symbiotic world of natural life and virtual entities, and digital and physical integration. In such a post-human era, life entities, in whatever form, will continue their journey of open-ended evolution in the boundless universe.

]]>
https://admin.next-question.com/features/what-will-future-life-forms-look-like/feed/ 0
Acupuncture: The Revival of Ancient Wisdom in Modern Neuroscience https://admin.next-question.com/features/acupuncture-ancient-wisdom/ https://admin.next-question.com/features/acupuncture-ancient-wisdom/#respond Sat, 28 Sep 2024 19:19:52 +0000 https://admin.next-question.com/?p=2452

In modern medicine, treating chronic pain remains a significant clinical challenge. Although numerous molecular targets associated with pain have been identified, efforts to develop non-addictive and safer analgesics have achieved limited success. A primary reason for this is that pain treatment involves more than just targeting a single molecule; it requires addressing the complex interactions between the nervous system, immune system, and target tissues. Traditional drug therapies often fail to account for this systemic dynamic. However, the recent emergence of bioelectronic medicine offers new perspectives and solutions.

This article explores how neuromodulation can directly influence organ function, potentially bypassing the common side effects of traditional drug treatments. Specifically, it focuses on the resurgence of acupuncture in modern medicine and its potential application in managing systemic diseases. By analyzing recent research advancements, we will show how acupuncture activates specific autonomic neural circuits (somatic-autonomic reflexes activating sympathetic/parasympathetic pathways), effectively controlling inflammatory responses and other complex disease states.

Additionally, in collaboration with the NIH-funded SPARC program, this article discusses future directions in acupuncture research and neuromodulation technologies, aiming to provide new therapeutic approaches and scientific insights for clinicians and researchers.

 

1.  Exploring New Therapeutic Strategies in Neural, Immune, and Tissue Interactions

During disease progression, the human body is not just a collection of isolated molecular targets but a system involving the dynamic interaction of the nervous system, immune system, and target tissues. The nervous system not only connects the brain and spinal cord but also interacts closely with the immune system and major organs, regulating their functions through subtle electrical signals. When these signals malfunction, organ dysfunction can occur, causing pain and various conditions such as hypertension, heart disease, urinary incontinence, and gastrointestinal disorders.

Compared to traditional drug therapies, directly transmitting signals to target organs through peripheral nerve stimulation can effectively avoid potential side effects of intermediary processes. In recent years, medical devices that modulate nerve-organ interactions have become increasingly important therapeutic options, known as “bioelectronic medicine.”

However, current commercial devices struggle to replicate the firing patterns of nerve fibers in healthy individuals. This difficulty primarily arises from our limited understanding of the physiological functions of peripheral neural circuits and the distribution and pathways of individual nerve fibers.

Developing nerve stimulation devices based solely on general descriptions of the peripheral nervous system, such as the sympathetic and parasympathetic systems, is inadequate. The nervous system is highly heterogeneous, and every detail can affect the efficacy and safety of treatment. For instance, the vagus nerve in the neck contains approximately 100,000 nerve fibers. The thick A and B fibers have diameters 10-20 times that of the unmyelinated C fibers and have lower electrical stimulation thresholds, requiring much less stimulation than the thin fibers. Current technologies cannot selectively stimulate the thin unmyelinated C fibers without activating the thick fibers, an aspect often overlooked in designing therapeutic strategies.

To develop more effective therapeutic strategies, we must deeply understand the neural mapping and functional interactions of the nervous system and organs at a systemic level, along with the mechanisms of neural regulation during disease progression. This comprehensive understanding can lead to fundamental interventions and treatments, opening new avenues for chronic disease management.

 Figure 1. Acupuncture, a characteristic Chinese neuroregulatory method. Source: Wikipedia

Within this medical framework, acupuncture (Fig.1), a traditional Chinese treatment method with over two thousand years of history, enters the modern medical research field with its unique approach. The core idea behind this treatment is that stimulating specific body areas (acupoints) can remotely regulate organ physiological functions. Randomized clinical trials have shown that needling specific body parts (acupoints) can effectively treat gastrointestinal motility disorders, stress urinary incontinence, and chronic pelvic pain.

According to traditional Chinese medicine (TCM) theory, the functional connection between somatic tissues and organs is mediated by “meridians.” However, modern research has yet to provide evidence supporting the physical existence of these “meridians.” 1 Systematically understanding the neural mapping and functional interactions in acupuncture, particularly its regulatory mechanisms during disease progression, will accelerate the modernization of acupuncture research.

 

2. Acupuncture Drives Specific Autonomic Neural Circuits: The Neural Basis of Treating Inflammation

The neural basis of acupuncture’s anti-inflammatory effects begins with the activation of somatic-autonomic reflexes. This process originates from somatosensory neurons located in the dorsal root ganglia or trigeminal ganglia. Activation of these neurons triggers a cascade of signal transmission through the spinal cord to the brain, ultimately activating peripheral autonomic nervous systems, including the sympathetic and parasympathetic neural circuits that regulate physiological processes (Fig.2).

 

 Figure 2. The autonomic nervous system. Source: christopherreeve

(1) Mechanisms of Acupuncture-Activated Neural Reflexes

Recent research has demonstrated the diverse effects of electroacupuncture, which can drive various autonomic neural circuits and dynamically regulate systemic inflammation by modulating immune cell activity 1. For example, in 2000, Tracey and colleagues used electrical stimulation of the cervical vagus nerve to inhibit systemic inflammation (Fig.3). This process is partially achieved by activating splenic sympathetic neurons, although the exact pathway of sympathetic neuron activation remains controversial.

With the advancement of modern molecular and genetic tools, our understanding of somatic-autonomic reflex circuits has become more refined 1. In the study of sympathetic neurons, past methods relied on chemical or surgical approaches, making it challenging to investigate the functional heterogeneity of sympathetic neurons.

The emergence of molecular profiling of sympathetic neurons allows researchers to use cross-genetic tools to label, knock out, or silence relevant neuron subtypes. For instance, in the spleen, the largest immune organ, most sympathetic neurons can be labeled using the NPY-Cre driver. In a model of lipopolysaccharide (LPS)-induced systemic inflammation, it was found that NPY-positive noradrenergic splenic sympathetic neurons function as an endogenous anti-inflammatory system. Their cellular knockout or molecular knockdown exacerbates splenic inflammation.

 Figure 3. Hypothetical diagram of central autonomic circuits involved in acupuncture-mediated anti-inflammatory effects. Red lines indicate autonomic neural circuits that have been shown to affect acupuncture’s efficacy; blue lines indicate known autonomic physiological pathways that may mediate acupuncture’s effects but are yet to be confirmed. NTS, nucleus tractus solitarius; DMV, dorsal motor nucleus of the vagus; AMB, ambiguous nucleus. Source: Li, Yan-Wei, et al. “The autonomic nervous system: a potential link to the efficacy of acupuncture.” Frontiers in neuroscience 16 (2022): 1038945.

(2) Mechanisms of High-Intensity Electrical Stimulation

The NPY peptide released from sympathetic neurons can also regulate splenic immune responses. Experiments have shown that high-intensity electrical stimulation (1-3 mA) of the abdominal ST25 acupoint can effectively drive the somatic-spinal-splenic sympathetic circuit, significantly inhibiting LPS-induced systemic inflammation, relying on peripheral NPY-positive splenic noradrenergic neurons. In contrast, low-intensity (0.5 mA) electrical stimulation is insufficient to trigger the same response. High-intensity electrical stimulation of hindlimb acupoints, such as ST36, can also inhibit systemic inflammation through NPY-positive splenic sympathetic neurons rather than vagal reflexes.

The function of this somatic-splenic sympathetic reflex is dynamically regulated by the relative expression levels of anti-inflammatory β2-adrenergic receptors and pro-inflammatory α2-adrenergic receptors. Under normal physiological conditions, splenic immune cells mainly express high levels of β2 receptors; however, after LPS exposure, the enhanced expression of α2 receptors can reduce or even reverse the anti-inflammatory effects of high-intensity electroacupuncture, promoting LPS-induced systemic inflammation.

 

 Figure 4. Inhibition of systemic inflammation by electrical stimulation of the cervical vagus nerve, reference 2.

(3) Mechanisms of Low-Intensity Electrical Stimulation

Unlike high-intensity electrical stimulation, which directly activates the somatic-sympathetic neural circuit, recent studies have also revealed how low-intensity electrical stimulation effectively drives the somatic-vagal-adrenal anti-inflammatory neural circuit 1.

Torres-Rosas and colleagues first reported that 4V electroacupuncture stimulation of the ST36 acupoint produces an anti-inflammatory effect, dependent on both vagal reflexes and adrenal catecholamine release. Subsequent studies using genetic markers and cell ablation techniques confirmed the involvement of NPY-positive chromaffin cells in this anti-inflammatory response.

Further experiments demonstrated that low-intensity electroacupuncture (0.5 mA) can evoke a vagal-adrenal neural reflex from the hindlimb ST36 acupoint but not from the abdominal ST25 acupoint 3. Even high-intensity electroacupuncture (3 mA) at the ST25 acupoint failed to activate vagal parasympathetic output neurons in the dorsal motor nucleus of the vagus (DMV), indicating significant acupoint selectivity in the activation of vagal reflexes.

This neural reflex has shown its effectiveness in mitigating LPS-induced systemic inflammation, significantly reducing symptoms and protecting experimental mice from sepsis-induced death. Activating this vagal-adrenal neural reflex can effectively reduce LPS-induced systemic inflammation and protect mice from sepsis mortality. This reflex operates regardless of disease state; electroacupuncture stimulation before or after the peak of LPS-induced cytokine storm produces an anti-inflammatory effect 3. These studies suggest that electroacupuncture can regulate systemic inflammation through different autonomic neural pathways, based on acupoint selection, stimulation intensity, and disease state.

(4) Why Does Acupuncture Drive Specific Autonomic Neural Circuits?

Despite the above research providing valuable insights into the activation mechanisms of spinal circuits such as the somatic-vagal-adrenal axis reflex, the question of “why such reflexes are only activated in limb regions” remains unanswered. Advances in single-cell RNA sequencing of dorsal root ganglia neurons offer hope for addressing this question. Studies have shown distinct profiles of dorsal root ganglia neuron subtypes in the limbs and chest. For example, proprioceptors are significantly amplified in limb dorsal root ganglia compared to the chest.

Recent research by Liu et al. identified a group of neurons enriched in limb dorsal root ganglia, marked by prokineticin receptor 2 (ProkR2-Cre), which specifically drive the somatic-vagal-adrenal axis neural circuit (Fig.5) 4.

A large subpopulation of ProkR2-Cre neurons is myelinated and peptidergic, expressing neurofilament heavy chain protein (NEFH) and calcitonin gene-related peptide (CGRP). They are selectively distributed in deep fascia of limb regions, such as skeletal periosteum, joint ligaments, and fascia surrounding muscles near bones. Activation of these neurons is necessary and sufficient to drive the vagal-adrenal anti-inflammatory reflex 4. The distribution of myelinated ProkR2-Cre nerve fibers can predict the effectiveness of low-intensity electroacupuncture in different body regions for driving the vagal-adrenal anti-inflammatory reflex 4, revealing the neuroanatomical basis of acupuncture driving specific autonomic neural circuits.

 

 Figure 5. Neuroanatomical basis of acupuncture driving the vagal-adrenal anti-inflammatory pathway 3.

(5) What Factors Affect the Effectiveness of Acupuncture?

The effectiveness of acupuncture is greatly influenced by acupoint selection, which displays clear regional specificity along the body’s longitudinal axis. From the perspective of driving the vagal-adrenal axis, effective limb acupoints like ST36 and LI10, compared to ineffective abdominal acupoints and non-acupoint regions in hindlimb muscles, as well as the depth of needling within effective acupoint regions, can all be explained by the differential distribution density of myelinated ProkR2-Cre nerve fibers in various tissues.

In abdominal fascia tissues, such as the peritoneum, myelinated ProkR2-Cre nerve fibers are almost absent, explaining why low-intensity electroacupuncture at the ST25 acupoint fails to effectively activate vagal reflexes or produce anti-inflammatory effects. In contrast, low-intensity electroacupuncture can sufficiently activate this anti-inflammatory reflex in the hindlimb ST36 and forelimb LI10 (Shousanli) acupoint regions, provided the acupuncture needle tip is close to the deep limb fascia rich in peroneal or radial nerves 4.

The sensory nerve distribution along the skin-to-deep-tissue axis indicates that needling depth is another critical factor determining the effectiveness of acupuncture, aligning with ancient Chinese medicine theories. Previous developmental, anatomical, and functional studies have revealed two somatosensory system circuits arranged along the skin and deep tissue 1. Additionally, limb muscles are divided along this axis, with muscle fibers containing slow-twitch type 1 and fast-twitch type 2 fibers, respectively enriched in the internal axial and external peripheral regions 1.

Notably, in the ST36 acupoint region located on the anterior side of the hindlimb, myelinated ProkR2-Cre-labeled nerve fibers that highly express NEFH are densely distributed in the internal region of the tibialis anterior muscle but are sparse in the external region and absent in the epidermis (Figure 6). Therefore, effective treatment in this area requires deeper needle insertion to activate the vagal-adrenal anti-inflammatory reflex.

On the posterior side of the hindlimb, large muscles like the gastrocnemius (GC) and the semitendinosus (SD) in the thigh are similar to the external region of the tibialis anterior, based on common developmental molecular expression and muscle fiber types. These muscles are often used as non-acupoint controls. Unfortunately, myelinated ProkR2-Cre nerve fibers are sparsely distributed within these large posterior hindlimb muscles; therefore, 0.5 mA electrical stimulation in these muscles fails to produce an anti-inflammatory effect 4.

 

 Figure 6. Diagram of the somatic-vagal-adrenal anti-inflammatory axis 1. For cross-sectional illustration of the ST36 region of the hindlimb, light blue and yellow indicate the internal axial (I.M.) and external peripheral (O.M.) regions of the muscle, respectively. T.: tibia. F.: fibula. NST: nucleus tractus solitarius. DMV: dorsal motor nucleus of the vagus.

 

3. The SPARC Initiative in the United States: A New Impetus for Understanding the Mechanisms of Acupuncture

Understanding and manipulating the complexities of the nervous system is a key area of research in bioelectronic medicine. While individual research teams have made significant advancements, comprehending the interactions between nerves and organs requires broader collaboration and systematic approaches. The NIH’s SPARC (Stimulating Peripheral Activity to Relieve Conditions) initiative was created to meet this need.

The SPARC initiative funds interdisciplinary research projects aimed at achieving precise control over end-organ functions through the development of new neural stimulation devices and protocols.

Researchers under this initiative focus on mapping detailed interactions between nerves and organs and developing neuromodulation devices that effectively control and improve organ functions. They explore new applications for neuromodulation therapies through public-private partnerships and integrate SPARC-funded research data into an online resource called the SPARC Portal, which consolidates neuromodulation data, innovative technologies, and other research resources to promote new scientific discoveries and applications.

 https://ncats.nih.gov/research/research-activities/SPARC

In its first phase, SPARC supported the development of new tools and technologies, mapped connections between different neural and organ systems, and created a wealth of public resources available at sparc.science. These efforts provided cutting-edge information and tools to advance bioelectronic medicine.

Building on these achievements, the project has now entered its second phase. This phase focuses on the anatomical and functional connectivity of the human vagus nerve (SPARC-V), creating an open, standardized ecosystem of neuromodulation devices (SPARC-O), challenging the innovation community to demonstrate new capabilities (SPARC-X), and continuing to share data and digital resources through the SPARC Portal. These complementary initiatives aim to promote the development of the next generation of bioelectronic medicine therapies.

SPARC-V (Vagus Nerve Mapping and Physiology): 1. The Reconstructing Vagus Anatomy (REVA) project is creating a more precise and detailed map of the human vagus nerve. 2. The Vagus Nerve Stimulation Parameter Extraction (VESPA) project is determining the physiological effects of modulating vagus nerve activity to discover the best ways to stimulate nerve fibers for specific therapeutic outcomes.

SPARC-O (Open-Source Neuromodulation Technology): The Human Open Research Neural Engineering Technologies (HORNET) project is developing open-source technologies and components to safely and effectively alter neural function.

SPARC-X (Neuromodulation Prize): The Neuromodulation Prize is an exciting challenge competition to inspire proof-of-concept demonstrations of innovative bioelectronic medicine approaches that help patients.

Advancing Bioelectronic Medicine Resources: The SPARC Portal drives the development of bioelectronic medicine by providing open-access digital resources that can be shared, cited, visualized, computed, and used for virtual experiments.

 

4. Discussion and Prospects for Modern Acupuncture Research

Recent advances have revealed how acupuncture drives disease treatment through specific autonomic neural circuits. For example, to manage cytokine storms, limb acupoints like ST36 are selected for low-intensity electroacupuncture. This method selectively activates the vagal-adrenal anti-inflammatory reflex pathway by targeting myelinated ProkR2-Cre-labeled nerve fiber bundles that innervate the deep fascia of the limbs.

However, the somatic and autonomic nervous systems comprise numerous molecularly and functionally distinct cell types, whose interactions involve complex integrations within the spinal cord and brain. Thus, our current mapping of somatic-autonomic reflex circuits only scratches the surface 1.

For instance, the detailed mechanisms of somatosensory pathways driving most autonomic neural circuits, such as the sympathetic pathways to the stomach, spleen, and kidneys, and the parasympathetic efferent pathways innervating visceral organs, have yet to be fully characterized. Early studies on the analgesic effects of electroacupuncture indicate that stimulation frequency is another crucial parameter 1, but how this parameter influences somatic-autonomic reflexes remains largely unknown.

Additionally, acupuncture is typically applied under pathological conditions in clinical settings, fundamentally different from the normal physiological conditions under which laboratory studies are conducted. For example, visceral diseases can cause referred pain in skin areas, and needling these sensitive skin regions, traditionally known as Ashi points, is paradoxically used to treat visceral diseases 1. However, with central sensitization causing skin hypersensitivity, electroacupuncture-induced somatic-autonomic reflexes may change dramatically, similar to how normally innocuous mechanical stimulation of the skin can become pain-inducing under pathological conditions.

Therefore, studying the plasticity of somatic-autonomic reflexes under various pathological conditions is crucial. This research helps optimize stimulation parameters for acupuncture in remotely treating diseases and provides new insights for therapeutic strategies. Moreover, integrating the SPARC initiative in the United States shows that future research will go beyond mapping the functional and anatomical neural circuits related to acupuncture in mice.

Broadly, we need to resolve the functional and anatomical neural circuit maps related to human acupuncture, building an open, standardized data resource and ecosystem. This includes functional anatomical neural maps, virtual neurofunctional anatomical models, regulatory devices and related parameters, and specific treatment protocols for diseases and symptoms. Achieving this requires interdisciplinary innovation and collaboration among researchers, clinicians, industry, and government sectors.

]]>
https://admin.next-question.com/features/acupuncture-ancient-wisdom/feed/ 0
Mapping the Largest Fragment of the Human Brain https://admin.next-question.com/science-news/draw-brain-fragments/ https://admin.next-question.com/science-news/draw-brain-fragments/#respond Sat, 28 Sep 2024 00:07:13 +0000 https://admin.next-question.com/?p=2433 The human brain operates with remarkably low energy consumption while performing vast computational processes. Understanding this requires a deep comprehension of the synaptic connections, spatial structures, and static networks of neural circuits, along with the dynamic cognitive processes built upon them. These tasks far exceed the traditional scope of brain region functionality studies. Connectomics research aims to achieve this by precisely mapping how each cell connects with others.

With advancements in scientific research, especially the rapid improvements in computational and data processing capabilities, research methods for the nervous system are also evolving. In this context, technology giants like Google have begun investing in this field, applying advanced technologies to neuroscience research, thereby driving the rapid development of connectomics. Scientists have now mapped detailed connectomes of various organisms’ brains, rapidly changing our understanding of brain functioning.

Ten years ago, Google started focusing on AI innovations and collaborating with large datasets, thereby establishing a connectomics team. In reconstructing brain structures, they generated massive amounts of data. To effectively handle this data, the Google team developed the flood-filling networks (FFN) model to replace the traditional manual method of annotating cells in images [1]. These networks can automatically reconstruct neurons through tissue layers. Based on this, Google developed the SegCLR algorithm to automatically identify different parts and cell types within these networks [2]. The Google connectomics team also created TensorStore [3], an open-source C++ and Python software library for storing and managing large multidimensional datasets. This tool has achieved functionality far beyond connectomics and is now widely used at Google and in the broader machine learning community.

In 2018, Google partnered with the Max Planck Institute for Neurobiology in Germany to develop a deep learning-based system that can automatically map brain neurons [1]. They reconstructed the scanned images of one million cubic micrometers of a zebra finch brain.

 Illustration of segmenting each neuron in a small part of the zebra finch brain using FFN. Source: google.

In 2019, Google collaborated with the Howard Hughes Medical Institute and the University of Cambridge, using the FFN model and TPU chips to slice a fruit fly brain into thousands of 40-nanometer ultra-thin sections. They used a transmission electron microscope to generate images of each slice, producing over 40 trillion pixels of fruit fly brain images. These 2D images were then aligned to reconstruct a complete 3D image of the fruit fly brain [4]. In 2020, Google released the fruit fly hemibrain connectome, which included images covering 25,000 neurons and revealed their synaptic connections, accounting for about one-third of the fruit fly brain by volume [5].

▷ Fruit fly hemibrain connectome.

Recently, Google collaborated with the Lichtman Laboratory at Harvard University to publish an article in Science titled “A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution.”

▷ Shapson-Coe, Alexander, et al. “A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution.” Science 384.6696 (2024): eadk4858.

This study released the latest H01 dataset, a 1.4 PB rendered image of a small sample of human brain tissue. The H01 sample was imaged at a resolution of 4nm using serial section electron microscopy and then reconstructed and annotated through automated computational techniques, ultimately revealing the preliminary structure of the human cerebral cortex. The H01 dataset is not only the largest sample of its kind imaged and reconstructed to this extent in any organism, but it is also the first large-scale sample to study the synaptic connectivity of the human cerebral cortex, spanning multiple cell types across all cortical layers.

 

▷ Figure 1: The H01 dataset of the human cerebral cortex.

The research team encountered many obstacles in creating this image. One major issue was finding suitable brain tissue samples. Since the human brain begins to degrade rapidly after death, postmortem brain tissue is often unsuitable for research. Therefore, the team chose a tissue sample taken during brain surgery from an epilepsy patient, aiming to help control their seizures. Additionally, identifying the structure of neural circuits at the synaptic level requires higher resolution electron microscopy. Serial EM, with its automation and rapid imaging capabilities, can image cubic millimeter volumes of tissue at nanoscale resolution, making it a powerful tool for reconstructing the fine neural circuits of the human brain.

The outcomes of this work primarily include the following four aspects:

 

1.Reconstruction of Human Brain Samples and Neuron Connectivity

The human brain sample used in this study was taken from the left anterior temporal lobe of an epilepsy patient during surgery. The sample was about the size of half a grain of rice. After fixation, staining, and embedding, the sample was sectioned using an automatic tape-collecting ultramicrotome (ATUM), producing 5,019 tissue sections with an average thickness of 33.9 nm, totaling 0.170 mm in thickness. These sections were then placed on silicon wafers and imaged under a custom-built 61-beam parallel scanning electron microscope at a resolution of 4×4 nm², rapidly capturing images with a total imaging volume of 1.05 mm³. The raw data size of each section reached up to 350 terabytes, and the overall dataset size reached 1.4 petabytes.

The subsequent computational challenge was to analyze such a massive dataset. The researchers successfully addressed this issue using artificial intelligence algorithms. The team computationally stitched and aligned this data to generate a single 3D volume. Despite the overall high quality of the data, these alignment channels needed to be robust to handle challenges such as imaging artifacts, missing sections, variations in microscope parameters, and physical stretching and compression of the tissue. Once aligned, a multi-scale FFN (flood-filling network) algorithm, utilizing thousands of Google Cloud TPUs, was applied to generate 3D segmentations of each individual cell in the tissue. FFN is the first automated segmentation technique capable of producing sufficiently accurate reconstructions. Additionally, other machine learning pipelines were used to identify and annotate 130 million synapses, dividing each 3D segment into different “subregions” (e.g., axons, dendrites, or cell bodies) and identifying other structures of interest, such as myelin and cilia.

However, the results of automated reconstruction were not perfect. Given the massive amount of neuronal structural data, it was impractical for individual lab members to manually proofread the entire dataset. Therefore, researchers from Princeton University and the Allen Institute developed an online proofreading platform called CAVE (Connectome Annotation and Versioning Engine), as an extension of Neuroglancer. This platform allows users to proofread and update annotated data to improve the dataset as part of their research. Researchers can select specific neurons or their connections and manually refine and annotate their reconstructions. CAVE is already used for smaller datasets, including the fly connectome.

For this project, researchers updated the algorithms to handle such large datasets. The interface permits proofreading, which is a crucial final step in completing the connectome reconstruction. Anyone can apply to become a proofreader, and proofreaders can download data related to the cells they are working on for subsequent analysis. Additionally, VAST (Volume Annotation and Segmentation Tool) is available as a general free software for viewing, segmenting, and annotating large datasets. For diversified analysis, researchers also provided various databases allowing users to query cell and synapse data specifically. To support complex query conditions, they developed a standalone program called CREST (Connectome Reconstruction and Exploration Simple Tool), which identifies and explores cell connections based on many cell characteristics.

▷ Source: Viren Jain

2. Composition of Cells and Synapses in the Human Cerebral Cortex

Through identification and quantitative analysis of all nucleated cells in the sample, 49,080 neurons and glial cells, along with 8,100 vascular-related cells, were identified. The number of glial cells (32,315) was about twice that of neurons (16,087). These neurons were mainly divided into spiny (65.5%), non-spiny, and non-pyramidal types. The neuronal density in the human cerebral cortex is about 16,000/mm³, which is approximately one-third lower than previously estimated by optical microscopy in the human temporal cortex and nearly ten times lower than the density in the mouse associative cortex.

Based on the size and distribution density of the cell bodies, the researchers divided the sample into six cortical layers and white matter. The distribution of glial cells varied among these layers. Notably, protoplasmic astrocytes were densely distributed from layers 2 to 6. However, in the shallower part of layer 1, they tended to be more mixed, with higher density and smaller size. Fibrous astrocytes in the white matter were more elongated than those occupying the cortex. Two other types of glial cells, microglia and oligodendrocyte precursor cells (OPCs), had nearly uniform density across all layers. Another type of glial cell, oligodendrocytes (n=20,139), exhibited a gradient distribution, with the lowest density in the upper layers and the highest density in the white matter, likely related to their role in myelination. Microglia, OPCs, and oligodendrocytes all showed an affinity for blood vessels. A large number of myelinated axons were radially distributed between the white matter and the superficial cortical layers, horizontally distributed on two orthogonal axes. The reconstructed blood vessels did not show layer-specific distribution.

Synapses are the bridges of the neural network. The human brain has 86 billion neurons, and synapses allow the transmission of electrical signals from one neuron to the next. To identify synaptic sites, the researchers trained a classifier based on the U-Net architecture to label three categories: background, presynaptic, and postsynaptic. They also trained a binary ResNet-50 classifier to categorize each identified synapse as excitatory or inhibitory based on its electron microscopy appearance, postsynaptic structure type, and presynaptic neuron type (if known).

In the reconstructed human brain tissue, approximately 102.5 million (67.1%) synapses were identified as excitatory, and about 50.3 million (32.9%) as inhibitory (Fig.2). Further analysis showed that most of these synapses were located on dendrites (99.4%), with only a small portion on axon initial segments (0.2%) or cell bodies (0.4%). This data is accessible on the Neuroglancer platform. By analyzing the reconstructed synapses, the researchers found that excitatory synapses were mainly distributed in layers 1 and 3, while inhibitory synapses were predominantly found in layer 1. Additionally, the ratio of excitatory to inhibitory inputs received by pyramidal neurons and interneurons varied slightly across layers, reflecting the complex regulatory mechanisms of neural networks.

 Figure 2: Distribution of cells and synapses in the human cerebral cortex and white matter.

3.Morphological Subtypes of Layer 6 Triangular Neurons

The deeper layers of the cerebral cortex are less understood than the superficial layers due to the complexity and diversity of cell types present. In this study, the researchers focused on spiny neurons in the deep cortical layers, particularly triangular neurons, which are not well understood in primates. In the reconstructed human brain tissue, triangular neurons were predominantly located in layers 5 and 6 (n = 876), accounting for about one-third of the spiny neurons. These cells are characterized by their large basal dendrites extending from the soma in various directions, forming a notable directional distribution, thus referred to as “compass neurons.”

The study further distinguished two types of neurons based on the direction of their basal dendrites: one type with large basal dendrites extending anteriorly and another with large basal dendrites extending posteriorly at a mirror-symmetric angle. These two groups of cells formed a mirror-symmetric distribution in the cerebral white matter (Fig.3). Additionally, the direction of the basal dendrites exhibited a bimodal distribution, with most cells extending either significantly forward or markedly backward, while others were more tangentially oriented. The distribution of these basal dendrites corresponded to the functional regions of the cerebral cortex and might be related to the principal axis of the myelinated axons in the subjacent white matter. These neurons tended to cluster with nearest neighbor cells having the same orientation, suggesting that this spatial distribution is not random. The statistical correlation of these mirror-symmetric neurons indicates that such cell clustering might have some underlying, yet unclear, functional significance.

 

 Figure 3: Two mirror-symmetric subgroups of triangular neurons in the deep layers of the human cerebral cortex.

4.Multiple Synaptic Connections

Previous research has shown that axons in the rodent cerebral cortex occasionally form multiple synaptic connections on the same postsynaptic cell. To investigate whether the same phenomenon occurs in the human cerebral cortex, researchers used CREST to systematically identify strong connections in reconstructed human brain tissue. The results indicated that, in the human cerebral cortex, single axons occasionally establish multiple synapses on the same postsynaptic cell. Researchers systematically discovered this phenomenon in both excitatory and inhibitory axons. Although such strong connections are rare, they are extremely potent, with over 50 individual synaptic connections possible between pairs of neurons. Typically, 96.5% of axon-target cell contacts have only one synapse, while 0.092% of axon connections may have four or more synapses.

Furthermore, analysis of 2,743 neurons revealed that 39% of neurons had at least one dendritic input with seven or more synapses, indicating that strong axonal inputs are a common feature of human brain neurons. Interestingly, these multiple synaptic connections are often formed by axons maintaining close contact with dendrites over long distances (Fig.4).

The authors also explored whether the number of synapses established by an axon on target cells exceeded the expectations of a random model. The results showed that the incidence of strong connections observed was significantly higher than the model’s predictions (Fig.4). Although the phenomenon of single axons establishing multiple synapses on the same postsynaptic cell is uncommon in the human cerebral cortex, these rare, strong connections might be an important component of complex neuron-to-neuron communication.

 Figure 4: Exceptionally strong synaptic connections in the human cerebral cortex.

Using nanoscale resolution to explore the six-layer structure of the human cortex, researchers revealed details of brain tissue at the cellular to subcellular level, investigating the complex relationships among neurons, synapses, glia, and blood vessels. Additionally, the authors provided several software tools, such as the browser-based Neuroglancer and its derivatives CREST and CAVE, as well as VAST, greatly facilitating data visualization and annotation.

However, the study has certain limitations. Fresh samples from healthy individuals are unlikely to be obtained through this neurosurgical approach. Although the patient’s temporal lobe did not show substantial pathological changes under light microscopy, long-term epilepsy or its drug treatment may have subtle effects on cortical tissue connectivity or structure.

Some unusual features were observed in this tissue, including some very large spines, axon varicosities filled with unusual substances, and a few axons forming extensive “axon helices.” It is currently unclear whether these rare structures are related to the disease or its treatment or are common features of the human brain. Comparing samples obtained from individuals with different underlying conditions will help us better understand these phenomena. Moreover, since the structure of the human cerebral cortex partly depends on past experiences, there is structural variability among individuals, but the extent of this variability is still unknown.

Considering the greater variability in human experience, behavior, and genetics, and the fact that humans and other vertebrates have many identified neuron categories rather than single identified neuron types, comparing neural circuits among human brains may be more challenging.

]]>
https://admin.next-question.com/science-news/draw-brain-fragments/feed/ 0
Commemorating Wittgenstein: How Does He Influence the Future of Large Language Models? https://admin.next-question.com/features/commemorating-wittgenstein-llm/ https://admin.next-question.com/features/commemorating-wittgenstein-llm/#respond Sun, 15 Sep 2024 09:17:00 +0000 https://admin.next-question.com/?p=2409  

Today marks the 73rd anniversary of the passing of philosopher Ludwig Wittgenstein (April 26, 1889 – April 29, 1951). In the vast sky of philosophy, Wittgenstein shines like an eternal star. Not only is he one of the greatest philosophers in human history, but his profound insights into the philosophy of language continue to radiate brilliance and stand the test of time. Despite the significant transformation in his thoughts from his early to later periods, which formed a complex intellectual system, his exploration of the relationship between language, thought, and reality continues to provide crucial theoretical support for today’s technological innovations, especially in the development of large language models.

This article aims to explore Wittgenstein’s philosophy in the context of large language models. By examining his ideas, we hope to gain a glimpse into his vast and profound sea of thought and hope his ideas can illuminate the bewilderment of modern people.

1. Against Private Language: The True Meaning of Language Lies in Communication

In 2017, before the advent of large models, Facebook’s AI research lab discovered that two robots in their lab began to communicate in a non-human language during a negotiation dialogue experiment [1]. Although these dialogues seemed meaningless, they still raised concerns about losing control over robots. Today, with the emergence of more powerful AI technologies represented by large models, we seem to have more reasons to be cautious about whether AI can develop languages that are completely incomprehensible to humans.

This kind of thinking brings to mind the concept of Wittgenstein’s “private language.” In his book Philosophical Investigations, Wittgenstein distinctly argues against the concept of private language. He defines it as follows:

“Can we imagine a language in which a person writes down or speaks of his inner experiences—his feelings, moods, and so on — for his own private use? — Cannot we do so in our ordinary language? — But that is not what I mean. The words of this language are to refer to what only the speaker can know—to his immediate private sensations. Thus, another person cannot understand the language.”

The key term here is “inner experiences.” When we see a chatbot saying, “Talking to so many people makes me tired,” we understand that this is a metaphorical expression. After all, algorithms do not get tired. However, how do we know that the word “tired” in this sentence does not carry its common meaning but instead indicates “insufficient server resources”? This requires reliance on context and situational factors.

In Wittgenstein’s words from Philosophical Investigations:

“If I say of myself that it is only from my own case that I know what the word ‘pain’ means,—must I not say the same of other people too? How can I generalize the one case so irresponsibly?”

……

Therefore, the problem of private language, as mentioned earlier, disappears. The meaning of the word “pain” is not the private sample (in our previous example, the feeling after slapping oneself) but rather its use in our language game. Of course, I now or in the future know the meaning of the word, which means I can use it correctly. Taking an extreme case, even if “one cannot retain in memory what the word ‘pain’ refers to—so always calling something else by that name—but still uses the word in accordance with the usual symptoms and criteria for pain!”

Since language itself is a practical activity, only continuously used communication tools can be considered language. Therefore, ancient scripts like cuneiform and oracle bone script, once used for communication, became cryptographic codes once their users disappeared. However, people can still decipher their meanings through various means, such as the AI model Diviner that can decipher oracle bone script [2].

Large models are trained on textual data used for public communication by humans and cannot access individual experiences, which limits their ultimate capabilities. Therefore, is it a fantasy to expect large models to delve into personal inner experiences that even humans cannot comprehend? The answer to this question depends on the following three propositions:

P1: There exist subjective experiences that are difficult or impossible to convey to others.

P2: An individual can develop a private language based on their unique subjective experiences, assigning words or symbols to these experiences, with the meanings of the words or symbols remaining stable due to their consistent relationship with the experiences.

P3: Private language is based on an individual’s subjective experiences rather than public language conventions.

Wittgenstein pointed out that the consistency between personal experiences and their private language is insufficient to establish stable meanings for words or symbols. Language is a rule-governed activity and requires public standards for validation. Therefore, we can weaken P2 and propose semi-private languages such as the “Martian language” of the post-90s generation or the “Nüshu” script of Yong’an, Hunan.

P2 (modified): An individual can develop a “semi-private language,” influenced by their unique subjective experiences while still partially based on public language conventions, allowing for shared understanding and verification within a specific group.

Wittgenstein’s argument against private language emphasizes the practical function of language, demonstrating that the true meaning of language lies in its interactive function, which is also what large language models should focus on. Describing human subjective experiences using machine-generated text is entirely a fantasy—our natural language cannot do it, and it is even more impossible for machine-generated text. These beautiful life experiences, such as the gentle babbling of a brook or the affectionate touch and embrace of a loved one, should not rely on AI to provide substitutes.

▷ Ludwig Josef Johann Wittgenstein: April 26, 1889 – April 29, 1951, one of the most influential philosophers of the 20th century, whose research areas mainly include logic, philosophy of language, philosophy of mind, and philosophy of mathematics. Wittgenstein’s philosophy is often divided into early and late periods, with the early period represented by Tractatus Logico-Philosophicus and the later period by Philosophical Investigations. Early Wittgenstein’s thoughts focused on the logical structure between the world and propositions, believing that all philosophical problems could be solved through such structures. However, late Wittgenstein’s thoughts denied most of the assumptions in the Tractatus, arguing that the meaning of words can only be better understood within a given language game.

At the end of this section—regardless of whether we are convinced by Wittgenstein that private language does not exist—let us conduct a thought experiment. Suppose private language exists, and individuals can have their private language based on subjective experiences, what would this mean for large language models?

Large models are trained on publicly available data, including texts, dialogues, and other multimodal content created through human interaction in the public domain. For any word in the large model to have meaning, the training data must encapsulate a mechanism that gives the word meaning, sharing a public validation process.

If private language indeed exists, it means that large models trained on public data will not have access to these private languages. In this case, the following impacts would occur:

(1) Limited understanding of subjective experiences: Large models would struggle to fully understand or accurately represent the nuances and complexities of individual subjective experiences. This limitation may lead to difficulties in reading and generating text related to highly personalized and subjective issues.

(2) Capability limitations of large models: Since large models can only access the public aspects of human language, this may limit their ability to capture the full range of language diversity and expression.

(3) Challenges in personalization: Large models may face challenges in creating highly personalized content because their training data does not include private languages or highly individualized expressions related to unique personal experiences.

(4) Ethical considerations regarding harmful content on the dark web: The dark web contains much harmful content, often with coded language that outsiders may not understand. If private languages exist, large models trained on dark web content or contaminated by it could pose moral risks.

People hope that AI can interact with humans in ways that align with mainstream values. At this point, the existence of private languages moves from purely philosophical speculation to a subject requiring quantitative empirical research, necessitating extensive data collection to ultimately demonstrate under what conditions private languages can exist and to what extent they can be private.

2. Language Games, Humor, and Reinforcement Learning

In Wittgenstein’s later views, “language games” become a core theme. Consistent with his argument against private language as previously mentioned, language games emphasize that language should be vibrant rather than isolated. In Wittgenstein’s words, “I shall also call the whole, consisting of language and the actions into which it is woven, the ‘language game’.” At the same time, a game is an open system. He wrote,

“How is the concept of a game to be closed off? What still counts as a game and what no longer does? Can you give a boundary? No. You can draw one; for none ha s so far been drawn.”

Let’s refer to some familiar examples from daily life to discuss language games, such as language-based board games like Werewolf. These games seem to have rules on the surface, that is, making other players believe what one says through speech. However, there are no clearly defined rules on how to play these games. High-level players might use agreed-upon jargon like “backstabbing wolf,” whose meanings come from the game’s context. How to play the game, in fact, has no boundaries.

Wittgenstein’s philosophical works were mostly written in German, where the word “sense” has dual meanings of “feeling” and “meaning.” So, when we say that the sense of a word depends on its context, do we mean its “feeling” or its “meaning”? In real life, “feeling” can be likened to the “hand of the moon,” while “meaning” is the moon itself. In games like Werewolf, “feeling” is a player’s declaration of their identity, and “meaning” is whether “this declaration is meant to increase or decrease my chances of winning.” According to the framework of language games, if we can understand the “moon” through the “finger,” then language has served its purpose. The question is, can a large model deduce “meaning” (why a player says something) through “feeling” (what a player says)?

The answer to the above question is yes. Let’s first look at how a large model plays Werewolf [3]. The study below demonstrates multiple agents, played by ChatGPT, engaging in Werewolf. The results show that the agents can exhibit trust, confrontation, deception, and leadership behaviors similar to human players. During the game, the agents retrieve the most relevant experiences from their experience pool created based on previous game processes and extract suggestions from them to guide the reasoning and decision-making of the LLM.

▷ Figure 1. Trust relationships between agents, where each column represents a day in the game. As the game progresses, civilian players who initially do not know each other’s identities confirm that they belong to the same camp (with trust relationships). Source: Reference [3].

In another similar board game, Avalon, researchers further proposed the Recursive Thinking (ReCon) framework [4], which enables game-playing agents not only to judge the situation from their own perspective but also to consider “how other roles view my statements,” thereby uncovering the deception of other players. Practically, the value of this research lies in enabling large models to learn in environments containing deception and misinformation, thereby adapting better to real-world datasets that include such data.

▷ Figure 2. Schematic diagram of the Recursive Thinking method. Source: Reference [4].

Wei Lou wrote in Ten Lectures on Wittgenstein: “Do not ask ‘What is understanding?’, but ask how this word is used.” This indicates that based on the viewpoint of language games, we should consider the use and application of a word rather than focusing on its essence. For example, in the game of Werewolf, most players will claim to be good people. The meaning of “good person” here depends on how it is used in specific practice, rather than the essence of the word. By moving away from essentialist frameworks, methods like Recursive Thinking discussed earlier can identify deception and play the “language game” well.

In the book After Babel [5], it is stated that a common academic view on the source of human intelligence is that it comes from deception and strategizing in interactions. In the previously mentioned intelligent agents playing Werewolf and Avalon, after removing the higher-order thinking functions—meaning they no longer reassess “thinking” and speech content from the perspectives of other game participants—their performance on various metrics would decline [3-4]. So, can large model intelligent agents produce different kinds of intelligence through interactions containing deception (broadly defined as language games)?

Two recent studies can be referenced. One shows that using data from the “Weakly Intelligent Bar” for fine-tuning large models results in significantly better performance [6]. A possible explanation for this result is that the data from the Weakly Intelligent Bar almost entirely consists of language games containing deception and misleading information. Understanding the gags or jokes from the Weakly Intelligent Bar involves second-order thinking. Introducing these datasets allows the intelligent agents to understand what deception and misleading are through a form similar to stand-up comedy, thereby emerging with stronger reasoning abilities. Even in programming tasks with a style vastly different from the Weakly Intelligent Bar’s text, the models fine-tuned with this special data demonstrated the best performance.

For example, the question “Why doesn’t high school directly enroll university students to improve the graduation rate?” is deceptive. It requires second-order thinking, i.e., considering “What would happen if high school directly enrolled university students?” to realize the definition of “high school.” Before this, in all training contexts, the word “high school” did not include the meaning of “must involve high school students.” The definition of this term needs to be modified with the introduction of this question.

▷ Figure 3. Example of a question from the Weakly Intelligent Bar.

Another study aims to enable intelligent agents to think creatively and generate humorous image captions, similar to internet memes [7]. This creativity originates from the Leap-of-Thought (LoT) of Chain-of-Thought (CoT) thinking, exploring distant associations and designing a series of screening processes to collect creative responses, which are then used to further train models to generate humorous comments. This is similar to how AlphaZero uses self-play for reinforcement learning without human data support.

Wittgenstein’s last words were: “Tell them I’ve had a wonderful life.” This sentence can be interpreted in multiple ways within the framework of language games. It can be seen as a lifelong reinforcement learning process about “how to live happily” through the medium of language. This is similar to the Reinforcement Learning from Human Feedback (RLHF) process in modern ChatGPT, where language practice alters the evaluation and decision-making of other intelligent agents (whether Wittgenstein himself or large-scale models). Whether this can be regarded as a language game is a matter of opinion.

Wittgenstein once proposed: “A serious and good philosophical work could be written consisting entirely of jokes.” Humor can be seen as an advanced language game where participants give words new vitality in a new context through reflection. Nowadays, large models can generate humorous comments like the one below. The logic behind this is that the model can extract the part of the word “brother” that conveys closeness, typically used for living beings, and apply it cleverly to non-living entities.

▷ Figure 4. Meme and AI-generated creative comment example. Source: Reference [7], Chart: Cun Yuan.

In the future, we can expect large language models to summarize the common patterns among various jokes and the philosophical thinking behind them, much like Žižek [8]. More broadly, the unbounded nature of language games means that when intelligent agents (whether physical humans or non-physical large models) start interacting, it will be like peeling an endless onion, creating endless new contexts regardless of the focus area. In the future, we might expect large models not only to explain philosophical concepts through jokes but also to explain the fundamental principles of any discipline through humor. Large language models trained on specific philosophical terminology might even understand abstract philosophical concepts like “the experientiality of the Other.” Interested readers might consider constructing an AI intelligent agent capable of playing philosophical jokes as a “language game.”

Copy the link to view the conversation between the author and Kimi, the intelligent assistant: https://kimi.moonshot.cn/share/codpncbdf0j8tsi3i3eg

3. Family Resemblance and Word Vectors

Wittgenstein attempted to distinguish between “basic definitions” and “example definitions.” He opposed the pursuit of necessary conditions for words, arguing that basic definitions delineate not only the boundaries of a concept but also its core and existence. In contrast, his concept of “family resemblance” is a more ambiguous form of definition based on examples and similarities. It represents an example definition, where the application of a concept depends on a series of similar features rather than a set of strict necessary and sufficient conditions. As he wrote in Philosophical Investigations:

“I can think of no better expression to characterize these similarities than ‘family resemblances’; for the various resemblances between members of a family: build, features, color of eyes, gait, temperament, etc., etc. overlap and criss-cross in the same way.—And I shall say: ‘games’ form a family.”

According to the family resemblance theory, a large model’s grasp of concepts should also be seen as constantly evolving, rather than predefined in a formal manner like previous expert systems. One could say that the replacement of expert systems by large language models exemplifies Wittgenstein’s opposition to essentialism, an insight ahead of its empirical evidence.

▷ Figure 5. Left: Venn diagram of different language games, illustrating that the logical definition of language games is difficult since there are almost no shared attributes among all games. Right: Late Wittgenstein believed that fuzzy boundaries, where family resemblance lies, mean that words and their meanings are not fixed. Instead, they are growing, multifaceted, and ambiguous, requiring some intrinsic and unconscious similarity observations. Source: Wittgenstein’s Philosophy of Language. The Philosophical Origins of Modern NLP Thinking.

The similarity between words also appears in contemporary natural language processing, such as through word vectors generated by techniques like word2vec. These word vectors not only show the proximity of contextually similar words in vector space but also enable semantic operations such as “king – queen = man – woman,” thus echoing Wittgenstein’s concept of “family resemblance.”

▷ Figure 6. Left: context2vec architecture, bidirectional LSTM and cloze prediction objective. Right: context2vec embedding rules. Source: context2vec: Learning generic context embedding with bidirectional LSTM.

If we compare basic definitions to SQL database query languages, they retrieve knowledge from databases through precise targets. Example definitions, like word vectors, are akin to brushstrokes in Impressionist paintings, capturing data usage in a way that departs from traditional tabular structures. The complexity of reality means that many objects cannot be exhaustively listed with a precise, identifiable set of quantifiable attributes. Defining them through other similar objects is much easier.

Language models have proven that generating SQL queries based on natural language [9] can present open knowledge graphs [10]. This is a natural result of the language game paradigm. Language games are not limited to someone directing others to make specific gestures or postures. They also include predicting results under specific conditions by observing certain regular phenomena (such as the reactions of different metals with acids). In reinforcement learning tasks, large language models can extract knowledge about actions (how to do) and goals (what to do) from the current dialogue context [11]. This study also demonstrates that LLMs can appropriately balance pre-trained existing knowledge with new knowledge generated through dialogue. This behavior of modifying their cognitive model based on dialogue can also be seen as the “language practice” described by Wittgenstein. This feature is also reflected in the use of large models to play board games like Werewolf and Avalon.

▷ Figure 7. Related world knowledge for task χ enhances the characteristics of existing knowledge χ to improve the final prediction y. Source: Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing.

For NLP researchers, the concept of language games means abandoning the idea that “understanding language can exist independently of its function and produce purely objective understanding.” Current NLP benchmark tasks attempt to decompose “understanding” into manageable, evaluable units, often involving prediction, string mapping, or classification (such as sentiment analysis, question answering, word sense disambiguation, coreference resolution, etc.), as shown in SuperGLUE or BIGbench benchmarks.

However, whether these tasks truly “use language” remains debatable. Researchers behind these test paradigms often assume that there is a way to understand language that transcends actual usage, believing that understanding language merely involves mapping certain linguistic units to other units, rarely combining language with actual activities. Perhaps a better goal is not merely to create machines with “language understanding” capabilities but to explore whether intelligent agents can change other interlocutors’ evaluations and decisions through language, as exemplified by the earlier instances of decomposing robot tasks and balancing the influence of existing knowledge through language models.

4. Conclusion

Philosophers’ ideas often precede their times, especially those of geniuses. Many of the problems encountered in the field of contemporary natural language processing can find inspiration, albeit vague yet incisive, in Wittgenstein’s numerous manuscripts and notes. For example, the concept of family resemblance can lay the foundation for understanding, modeling, and constructing natural language systems that accurately describe the complexity of language. However, current researchers should not be content with theoretical deliberations alone; empirical research is necessary to validate or refute these philosophical hypotheses. As discussed earlier, the author holds an open attitude towards the existence of private language, considering the possibility of semi-private language as an intermediate state.

“The limits of our language are the limits of our cognition.” This idea is particularly relevant in contemporary large models, whose autoregressive nature makes it difficult for them to define their own boundaries, leading to issues such as hallucinations and difficulty in completing tasks involving self-reference, such as “constructing a sentence with ten words.” Wittgenstein’s philosophy will teach us humility until current NLP can fully capture the entire complexity of language. His early work, Tractatus Logico-Philosophicus, emphasizes that “the limits of language are determined by the basic propositions, and the application of logic determines which propositions are basic. Logic cannot foresee what lies in its application,” thereby opposing discussions on beauty, religion, ethics, etc., and “defining all those things many people are muttering about by maintaining silence on them.”

Some current researchers of artificial general intelligence (AGI) seem to have forgotten this lesson, focusing on areas lacking support from basic propositions, such as consciousness and superintelligence. This approach aligns with Wittgenstein’s later thoughts, which brought the concept of language down from an idealized abstract level to the practical, rough ground. After realizing that some things cannot be spoken of, our task may not be to avoid them, but rather to continually explore those unknown and inexpressible boundaries through a series of specific multimodal language tasks, such as “interactive completion,” “survival,” and “seeking happiness.” This process is also the beginning of endless progress.

Robert Hagstrom wrote in The Last Liberal Art: “The meaning of words is defined by their function in any language game. Wittgenstein did not believe in an omnipotent, independent logic existing in the world apart from what we observe but took a step back and considered that the world we see is defined and given meaning by the words we choose. In short, the world is created by us.”

Now, it is no longer just humans constructing our world but a collaboration between humans and machines, with machines playing an increasingly significant role in this process. We need to cautiously understand and regulate representative generative AI systems, such as large language models and vision models, as they shape our cultural environment and influence our narratives. Each of us lives in a world we help create, and the mechanism of creation has now been updated. We are standing at a new starting point. No one has the freedom to force everything into a predefined mold. Be true to oneself—ultimately, as Kant said, people must “legislate for nature” by understanding and participating in discussions related to large models.

 

 

]]>
https://admin.next-question.com/features/commemorating-wittgenstein-llm/feed/ 0
Who Says Intuition and Deliberation Are Incompatible? A New Perspective Based on Free Energy https://admin.next-question.com/uncategorized/who-says-intuition-and-deliberation-are-incompatible-a-new-perspective-based-on-free-energy/ https://admin.next-question.com/uncategorized/who-says-intuition-and-deliberation-are-incompatible-a-new-perspective-based-on-free-energy/#respond Fri, 06 Sep 2024 05:54:04 +0000 https://admin.next-question.com/?p=2393  

 

Cognitive science traditionally categorizes human behaviors and those of algorithm-driven agents into two types: one is goal-driven, like an explorer with a map who knows where they are headed; the other follows habits, like a student who consistently takes the same route to school. It has been generally accepted among researchers that these two types of behavior are governed by distinct neural mechanisms. However, a recent study published in Nature Communications proposes that both types of behavior can be unified under the same theoretical framework: variational Bayesian theory. ▷ Han, D., Doya, K., Li, D. et al. Synergizing habits and goals with variational Bayes. Nat Commun 15, 4461 (2024). https://doi.org/10.1038/s41467-024-48577-7

 

1. Question: The Same Domain, the Same Essence

In scientific research, whether studying animals, humans, or machine learning algorithms, we often encounter the question: When faced with a new environment or challenge, should we rely on instincts and habits, or should we try to learn new methods? This question may seem to span various fields, but in reality, these fields all revolve around a central issue: How to strike the optimal balance between rapid adaptation and maintaining flexibility.

For example, when animals face environmental changes, they may instinctively react by seeking shelter or searching for food—behaviors that may be innate or learned. Similarly, in decision-making, people sometimes rely on intuition (what psychologist Daniel Kahneman calls "System 1"), while other times they engage in more careful deliberation ("System 2"). In machine learning, some algorithms are "model-free," meaning they learn from experience without predefined rules, while others are "model-based," relying on explicit rules and models. ▷ The characteristics of habitual behaviors (e.g., snacking while focused on work) and goal-directed behaviors (e.g., planning a diet meal).

These seemingly different situations are, in fact, quite similar: whether in biological or artificial systems, the key challenge is to balance rapid adaptation with flexibility in handling new situations.

For instance, bacteria quickly adapt to their environment through chemotaxis (the instinct to move toward nutrient-rich areas). When faced with an environment without pre-established rules, an intelligent agent may behave like a student immersed in solving problems, quickly reaching the goal. However, more complex organisms or algorithms may require more flexible strategies to tackle more sophisticated challenges.

To explain the process of goal-directed learning, neuroscientists have proposed the theoretical framework of active inference. This framework posits that the brain is constantly trying to minimize uncertainty and unexpected outcomes when predicting the environment by directing the body to interact with it. The core concept of this theory, "free energy," measures the difference between the probability of sensory inputs and the expected sensory inputs. The process of active inference is essentially the process of minimizing free energy.

While "active inference" provides insight into goal-directed learning, it remains a hypothesis that has yet to be fully validated in the scientific community. There is still insufficient empirical evidence to support the neural mechanisms behind it. For instance, active inference explains goal-directed behaviors as a process of minimizing the gap between goals and reality. However, it falls short in explaining habit-based behaviors that do not require conscious intervention or rely on external feedback.

Recent studies have attempted to demonstrate how these seemingly opposing behavioral modes, goal-directed and habit-driven, can work together within a unified theoretical framework, enabling organisms to efficiently and flexibly adapt to their environment.

 

2. Discovery: Predictive Coding and Complexity Reduction—The Brain's Ongoing Bayesian Inference

To better understand this new framework, we can liken the brain to a chef constantly experimenting with new dishes. When a chef adjusts his menu, is he making the dishes more appealing to customers or simply cooking out of habit? In reality, he is doing both. On the one hand, he reduces the gap between his culinary creations and the customers' tastes. On the other hand, he continuously updates and simplifies the predictive model of changing customer preferences—his expectations of their tastes.

This process can be illustrated with a simple example. A limited menu with only a few dishes may lack the flexibility to satisfy all customers' needs. In contrast, a complex menu that can be freely adjusted based on customer feedback might better accommodate different tastes, but it could become too complicated to manage, increasing costs or leading to inconsistent flavors.

Scientists have described this process in mathematical terms, defining "latent intentions" to expand the concept of free energy. In this context, free energy includes not only the concept from active inference but also the agent's behavioral tendencies and predictions about observations. The agent's learning behavior can be seen as a continuous updating process (Markov chain) aimed at minimizing the value of Zt. This value comprises prediction error (the discrepancy with reality) and KL divergence (the complexity of the model).

British cognitive scientist Andy Clark pointed out that the brain is a powerful prediction machine, constantly forecasting upcoming sensory inputs and adjusting these predictions based on actual inputs. In this process, the prediction error corresponds to the first term in the aforementioned formula. The second term, KL divergence, measures the difference in probability density between predictions before and after an action, reflecting the complexity of the predictive model. In habit-driven learning, whether an action is taken or not does not affect the prediction, making this term zero, meaning no model exists. Thus, KL divergence, which represents model complexity, turns the binary distinction between model-free and model-based approaches into a continuous spectrum.

▷ a) Schematic of integrating habits and goal frameworks. b) Structure of the framework during training. c) Structure of the framework during behavioral processes.

In this framework, when faced with a new goal, early learning resembles a model-based system, similar to a chef who, upon opening his restaurant, tries to optimize his predictions of new customers' tastes. Once this predictive model has been sufficiently refined through continuous training, it may shift toward a more habit-driven approach, where the chef continually perfects his signature dishes.

 

3. Significance: Enabling AI to Perform Zero-Shot Learning

One significant advantage of human intelligence is the ability to solve various tasks in entirely new environments without relying on prior examples. For instance, when a painter is asked to depict a mythical creature like a qilin, he or she only needs to know that the qilin is a symbol of good fortune. However, an AI would require specific prompts, such as: "Depict a qilin from ancient Chinese mythology, with a dragon's head, deer antlers, lion's eyes, tiger's back, bear's waist, snake's scales, horse's hooves, and an ox's tail, exuding an overall aura of majesty and sanctity, with gold and red as the primary colors, against a backdrop of swirling clouds, symbolizing good fortune, peace, and imperial power."

The first painter to depict a qilin did so through zero-shot learning after acquiring sufficient experience in painting; however, zero-shot learning remains a challenging task for current AI systems. This is precisely the issue that this framework aims to address.

When the environment changes, an agent built using the integrated framework proposed in this paper can spontaneously switch from habit-based model-free learning to model-based learning. This allows the agent to adapt to the new environmental conditions. In their experiments, researchers used a T-maze to test the adaptability of these agents. In this maze, the agent must decide which direction to take based on rewards on either side, learning strategies to maximize rewards.

▷ T-maze and the three stages of the agent in response to environmental changes.

In a habit-based system, the agent might continue following the previously rewarded path, even if the reward has changed. However, goal-directed agents face a different challenge. For example, if the initial reward on the left side of the maze is 100 times greater than on the right, the agent might need to try the left side a hundred times before updating the model and attempting the right side (depending on the specific agent model). This is undoubtedly an inefficient approach; in the real world, an organism displaying such behavior would likely be eliminated by natural selection. The framework proposed in this paper integrates goal-driven and habit-driven approaches, allowing the agent to balance flexibility and speed. Initially, it adapts to the environment by choosing the right side; when the environment changes (the left reward disappears), it readjusts to choose the left side.

The simple T-maze experiment shows that the new framework aligns with Yann LeCun's concept of the world model. LeCun emphasizes that a world model has a dual role of planning for the future and estimating missing observations, and it should be an energy-based model. In goal-directed behavior, this framework uses the current state, goal, and intended actions as inputs, and outputs an energy value to describe their "consistency." It can be said that the agent's decision-making in the T-maze builds upon and relies on the world model envisioned by LeCun.

There is a long road ahead from the simple T-maze to highly complex large language models. However, based on the theoretical framework described in this paper, we can observe some important similarities. For example, in training language models, we typically predict based on existing vocabulary, similar to scenarios without specific goals set during the training phase. The flexibility of goal-directed planning in this framework comes from its ability to break down any future goal into a series of sequential steps, predicting only the next observation. This method minimizes the discrepancy between goal-directed intentions and prior distributions, thereby compressing the search space and making the search process more efficient. This approach is also suitable for large models.

Additionally, based on the KL divergence term in the framework, we can understand the hierarchical structure in predictive coding, where a hierarchical information processing method is used to reduce model complexity. Predictive coding theory also suggests that the brain learns to recognize patterns by filtering out information that can be predicted through natural world patterns, thereby reducing unnecessary data. This information processing strategy echoes the information bottleneck theory, showing how cognitive processes can be optimized by using lower-dimensional representations.

Finally, this theoretical framework not only enhances our understanding of healthy brain function but also provides new perspectives for understanding and treating neurological disorders. For example, patients with Parkinson's disease often struggle with goal-directed planning abilities and rely more on habitual behaviors. This may be due to high uncertainty within goal-directed intentions. Research on how medical interventions or deep brain stimulation (altering internal states) and sensory stimulation (altering brain input) can reduce this uncertainty may provide methods to improve motor control in Parkinson's patients.

Moreover, research on autism spectrum disorder (ASD) can also benefit from this theoretical framework. Individuals with autism often exhibit repetitive behaviors, which may be related to an overemphasis on model complexity in predictive coding, affecting cognitive behavioral flexibility when adapting to changing environments. Introducing some randomness to increase behavioral diversity could be a potential intervention strategy.

 

 

]]>
https://admin.next-question.com/uncategorized/who-says-intuition-and-deliberation-are-incompatible-a-new-perspective-based-on-free-energy/feed/ 0
(中文) 预测误差的神经机制 | 追问顶刊 https://admin.next-question.com/uncategorized/%e4%b8%ad%e6%96%87-%e9%a2%84%e6%b5%8b%e8%af%af%e5%b7%ae%e7%9a%84%e7%a5%9e%e7%bb%8f%e6%9c%ba%e5%88%b6/ https://admin.next-question.com/uncategorized/%e4%b8%ad%e6%96%87-%e9%a2%84%e6%b5%8b%e8%af%af%e5%b7%ae%e7%9a%84%e7%a5%9e%e7%bb%8f%e6%9c%ba%e5%88%b6/#respond Tue, 03 Sep 2024 08:49:32 +0000 https://admin.next-question.com/?p=2381 Sorry, this entry is only available in 中文.

]]>
https://admin.next-question.com/uncategorized/%e4%b8%ad%e6%96%87-%e9%a2%84%e6%b5%8b%e8%af%af%e5%b7%ae%e7%9a%84%e7%a5%9e%e7%bb%8f%e6%9c%ba%e5%88%b6/feed/ 0
The “Thousand Brains Project” Officially Launched, Entering the AI Arena https://admin.next-question.com/features/the-thousand-brains-project-officially-launched-entering-the-ai-arena/ https://admin.next-question.com/features/the-thousand-brains-project-officially-launched-entering-the-ai-arena/#respond Fri, 30 Aug 2024 02:47:47 +0000 https://admin.next-question.com/?p=2373 In an era in which the field of artificial intelligence is fiercely competitive, a recent project known as "the AI initiative endorsed by Bill Gates" has captured public attention. This initiative, funded by the Gates Foundation, marks the official commencement of the "Thousand Brains Project."

The concept of "Thousand Brains" is not a novel one. As early as 2005, the project's leader, Jeff Hawkins, founded Numenta, a technology company aimed at creating biologically-inspired artificial intelligence. In 2021, Hawkins published a book titled A Thousand Brains.

In many ways, the emergence of the "Thousand Brains Project" is not surprising; it represents a natural progression, merging brain science theories with artificial intelligence as technology continues to advance. However, what piques our curiosity is: why now? Why the "Thousand Brains Project"?

▷ An illustration from Numenta’s website provides an introduction to the Thousand Brains Project.

 

1. Why Now?

Currently, the leading force in artificial intelligence remains traditional deep neural networks. Each "neuron" in these networks operates as an independent computational unit, processing input data and working in tandem with others to solve complex problems. These networks have been widely applied in areas such as image recognition and text prediction. When a neural network contains multiple layers of neurons, it is termed a "deep" neural network. The concepts of "neurons" and the "layered" structure of these networks were originally inspired by biological brain research from the last century. However, these foundational discoveries in neuroscience, made decades ago, have led to a divergence between the fields of neuroscience and artificial intelligence as each continues its deep exploration.

Today, deep neural networks surpass human performance in many tasks, from specialized applications like skin cancer detection to complex public games. Particularly after last year's revolutionary impact of large language models on public perception, AI momentum has become unstoppable. It seems that with vast amounts of data and powerful computational capabilities, the emergence of strong AI, long fantasized about, is inevitable. However, current AI systems also reveal significant shortcomings. For example, as large language models scale, their energy consumption becomes increasingly alarming. Additionally, experiments have demonstrated that neural networks are often unstable; minor perturbations in the input can lead to chaotic results, such as misidentifying objects by altering just a single pixel.

In light of these deficiencies, researchers are beginning to explore whether alternative approaches might enable further breakthroughs in AI. The "Thousand Brains Project" seeks to develop a new AI framework by reverse engineering the cerebral cortex. Hawkins remarked, "Today's neural networks are built on the foundations of neuroscience predating 1980. Since then, we have gained new insights in neuroscience, and we hope to use this knowledge to advance artificial intelligence."

The "Thousand Brains Project" is a testament to Hawkins' perseverance and accumulated knowledge, as well as another bold attempt by humanity to explore the possibilities of artificial intelligence in this era. But what is its true value and feasibility? To answer this, we must delve deeper into its underlying principles.

 

2. What is the "Thousand Brains Project"?

The "Thousand Brains Project" (TBP) was officially launched on June 5th at Stanford University's Human-Centered AI Institute. Jeff Hawkins, however, had been preparing for it for many years. According to the technical manual published on Numenta’s website, the Thousand Brains Project encompasses four long-term plans.

First, the core objective of the Thousand Brains Project is to develop an intelligent sensorimotor system. Its primary long-term goal is to establish a unified platform and communication protocol for such a system. This unified interaction protocol allows different custom modules to interact through a common interface. For example, a module designed for "drone flight optimized by bird’s-eye view" and another for "controlling a smart home system with various sensors and actuators" can interact according to TBP’s rules. This framework enables users to develop new modules based on their specific needs while ensuring compatibility with existing ones.

Second, to achieve universal interaction and communication, the Thousand Brains Project draws on neuroscience research related to the cerebral cortex. The project’s name is conceptually similar to "neural networks": the cerebral cortex consists of thousands of cortical columns, each divided into multiple layers of neurons. Numenta researchers believe traditional deep networks generate a single world model, processing data step by step from simple features to complex objects. In contrast, the "Thousand Brains Theory" suggests the brain integrates multiple world models generated by many cortical columns, as if each brain were operating thousands of brains in parallel.

The second goal of the Thousand Brains Project is to foster a new form of machine intelligence that operates on principles more closely aligned with how the brain learns, differing from today's popular AI methodologies. Communication between cortical columns—or different modules—is achieved through "long-range connections," mirroring the inter-regional communication observed in the cerebral cortex. Hawkins believes this modular structure will make his approach easily scalable, similar to the critical period in brain development when cortical columns are repeatedly replicated. However, Numenta notes in the technical manual that strict adherence to all biological details is not required in practical implementation. Instead, the Thousand Brains Project draws on concepts of neocortical function and long-range connectivity rather than rigidly following neurobiological specifics.

▷ Illustration of potential implementations from the Thousand Brains Project technical manual. Detailed annotations can be found at https://www.numenta.com/wp-content/uploads/2024/06/Short_TBP_Overview.pdf

 

Third, the AI developed through this project will also rely on research into "reference frames" in the cerebral cortex. In mammalian brains, place cells encode positional memory, and grid cells help map positions in space. The cerebral cortex uses these reference frames to store and understand the continuous stream of sensorimotor data it receives. The Thousand Brains Project aims to integrate these neuroscience discoveries into a cohesive framework. Hawkins explains, "The brain constructs data in two-dimensional and three-dimensional coordinate systems, reproducing the structure of objects in the real world. In contrast, deep networks do not fundamentally understand the world, which is why they fail to recognize objects when we make subtle changes to an image's features. By using reference frames, the brain can understand how an object’s model changes under different conditions."

Finally, we should recognize the Thousand Brains Project as essentially a software development toolkit. The developers aim for it to handle as diverse and varied a range of tasks as possible, facilitating communication, usage, and mutual application and testing among users.

In summary, the Thousand Brains Project is fundamentally a software development toolkit for robotic sensorimotor systems, incorporating certain neuroscience principles. Hawkins mentioned that potential applications of this new AI platform might include complex computer vision systems—such as analyzing what is happening in a scene using multiple cameras—or advanced touch systems to help robots manipulate objects.

"The Gates Foundation is very interested in sensorimotor learning to promote global health," Hawkins added. "For example, when ultrasound is used to image a fetus, it builds a model by moving the sensor, which is essentially a sensorimotor problem." Therefore, the Gates Foundation’s feasibility analysis of this project is conducted more from an engineering perspective.

 

3. How Does the "Thousand Brains Project" Work?

As previously mentioned, the "Thousand Brains Project" (TBP) primarily relies on two brain information processing principles: "cortical columns" and "reference frames." When translated into engineering terms, these correspond to "modularization" and "reference systems" as engineering equivalents. The TBP system consists of three main modules: the sensor module, the learning module, and the actuator module. These modules are interconnected and communicate through a unified protocol, ensuring that each module has appropriate interfaces, allowing for a high degree of flexibility in their internal workings.

In an interview, Hawkins stated, "Once we learn how to construct a cortical column, we can build more at will." So, how is each module realized within the TBP?

First, the sensor module is responsible for receiving and processing raw sensory input. According to the basic principles of the TBP, the processing of any specific modality (such as vision, touch, radar, or lidar) must occur within the sensor module. Each sensor module acts similarly to the retina, collecting information from a small sensory region—whether it's tactile information from a patch of skin or pressure data from a mouse's whisker. This localized raw data is converted into a unified data format by the sensor module and transmitted to the learning module, akin to how the retina converts light signals into electrical signals. Additionally, another critical function of the sensor module is coordinate transformation, which calculates the position of features relative to the sensor and the sensor relative to the "body," thereby determining the feature's position within the body's coordinate system. In summary, the sensor module transmits the current position and the external stimuli sensed at that position to the learning module in a generalized format.

▷ The sensor module receives and processes raw sensory input, which is then transmitted to the learning module through a universal communication protocol, allowing the learning module to learn and recognize models of objects in the environment.

 

The learning module is the core component of the TBP, responsible for processing and modeling sensorimotor data received from the sensor module. Each learning module operates as an independent recognition unit, and when combined, they can significantly enhance recognition efficiency (for example, identifying a "cup" by touching it with five fingers is much faster than with just one). The input to the learning module can be feature IDs from the sensor module or object IDs from lower-level learning modules, but they are still processed as feature IDs. These features or object IDs can be discrete (such as "red," "cylinder," etc.) or represented in higher-dimensional spaces (such as sparse distributed representations of color). Additionally, the learning module receives position information relative to the "body," and this reference frame, centered on the module itself, integrates space into a unified computational framework.

Based on the described feature and position information, higher-level learning modules can construct "composite objects" (such as assemblies or entire scenes). Beyond learning new models by "independently learning features" and "using a unified reference frame," the learning modules also communicate with each other using a standardized communication protocol through lateral connections. This communication, like the learning module itself, is independent of specific modalities. Therefore, learning modules under different modalities can compete or collaborate to "reach a consensus."

After independent internal computation and interaction with other learning modules, a learning module can determine an object's ID and its location. It can update the object's model using recent observations, continually expanding its understanding of the world. The TBP stresses that learning and understanding are two interwoven processes.

▷ The learning module uses reference frames to learn structured models through sensorimotor interactions. They model the relative spatial and temporal arrangement of incoming features.

 

Besides the simplest connection pattern of "sensor-learning module," the system can easily expand across multiple dimensions due to the universal communication protocol. Connecting multiple learning modules horizontally can enhance system robustness through their interactions. Stacking learning modules vertically enables more complex layered inputs to handle and combine the modeling process. In addition to this cross-spatial scale learning method, the TBP can even achieve learning across different time scales: lower-level modules can slowly learn and generalize statistical features of input, while higher-level modules are used to quickly establish momentary predictions of the current external state, serving as a form of short-term memory.

Each learning module produces a motor output, which takes the form of a "target state" that follows a universal communication protocol. The target state is calculated based on the learned model and certain assumptions, aiming to minimize uncertainty between different possible object models. In other words, the target state can guide the system's behavior toward the desired goal.

▷ By using a universal communication protocol between the sensor module and the learning module, the system can easily scale across multiple dimensions. This provides a straightforward approach to processing multi-modal sensory inputs. Parallel deployment of multiple learning modules can enhance robustness through a voting mechanism among them. Additionally, stacking learning modules enables more complex hierarchical processing of inputs and facilitates modeling composite objects.

 

The above description covers the basic functions and connections of the three modules; in practice, they can achieve more flexible connections and richer behaviors, as detailed in the technical manual. One particularly interesting feature is that the current learning modules can store learned models and use them to predict future observations. When these models are integrated to predict current observations, the prediction error can serve as input to update the model. This use of feedback signals is similar to patterns found in reinforcement learning, which is also reflected in the TBP.

However, the manual notes that real-time updates in the TBP predictions are not yet achievable. The future introduction of a time dimension to realize this functionality will greatly benefit object behavior encoding and motion strategy planning. For instance, this functionality could be applied to observing the continuous process of pressing a stapler or roughly simulating the physical properties of common materials. 

 

4. Why the "Thousand Brains Project"?

Returning to the initial inquiry: in an era where both artificial intelligence and brain-machine interfaces lead technological advancements, why has the Gates Foundation decided to provide $2.96 million in funding to the "Thousand Brains Project" over the next two years? Rather than assessing the judgment of the Gates Foundation's advisory team, I would like to offer my personal views on the significance and feasibility of the "Thousand Brains Project" as a conclusion to its introduction.

The "Thousand Brains Project" is a research initiative aimed at developing the next generation of artificial intelligence systems through biomimicry and neuroscience theories. On the one hand, by mimicking the functioning of the cerebral cortex, the project aims to develop intelligent systems capable of perception, learning, and executing actions. This has profound practical significance in fields such as healthcare and public health. These technologies can not only improve the quality of life and societal health standards but also effectively address various challenges and public health crises that future societies may face. Supporting the development and application of such technologies aligns with the expectation of a more intelligent and human-centered society.

Additionally, the "Thousand Brains Project" has far-reaching implications for the future of artificial intelligence and computer technology. Current deep neural networks typically require vast labeled datasets for training and struggle with real-time adaptability to dynamic environments. In contrast, the "Thousand Brains Project" aims to introduce hierarchical and parallel processing capabilities into AI systems through biomimetic methods, enabling more flexible, intelligent, and adaptive machine intelligence. The debate over whether to incorporate principles from the biological brain into artificial intelligence has been ongoing and remains complex. Regardless of the path chosen, the key challenge lies in effective implementation.

In terms of technical execution, the "Thousand Brains Project" aims to create software-based cortical columns that can achieve complex perception and action processes, such as vision and hearing, by connecting multiple units. This cross-modal integration will allow the system to process information from different sensory channels simultaneously, leading to a deeper understanding and interaction with the world.

This concept, while ambitious, is grounded in decades of dedication and persistence by Jeff Hawkins. Following the official launch, Numenta promptly launched a progress page for the "Thousand Brains Project," promising continuous updates on its developments. The launch marks the beginning of its journey into practical application rather than the end of the Thousand Brains Theory. Much uncertainty remains about its future development, and predicting its success or failure is impossible. However, it is precisely this uncertainty that makes scientific research so exciting. 

]]>
https://admin.next-question.com/features/the-thousand-brains-project-officially-launched-entering-the-ai-arena/feed/ 0