00:00:00

Share Your Feedback 🏝️

Represent Space and Time

Represent Space and Time

MinWoo(Daniel) Park | Tech Blog

Read more
Previous: Model | Kosmos-2.5 Next: Hyper Attention

Represent Space and Time

  • Related Project: Private
  • Category: Paper Review
  • Date: 2023-10-20

Language Models Represent Space and Time

  • url: https://arxiv.org/abs/2310.02207
  • pdf: https://arxiv.org/pdf/2310.02207
  • abstract: The capabilities of large language models (LLMs) have sparked debate over whether such systems just learn an enormous collection of superficial statistics or a coherent model of the data generating process – a world model. We find evidence for the latter by analyzing the learned representations of three spatial datasets (world, US, NYC places) and three temporal datasets (historical figures, artworks, news headlines) in the Llama-2 family of models. We discover that LLMs learn linear representations of space and time across multiple scales. These representations are robust to prompting variations and unified across different entity types (e.g. cities and landmarks). In addition, we identify individual space neurons and time neurons that reliably encode spatial and temporal coordinates. Our analysis demonstrates that modern LLMs acquire structured knowledge about fundamental dimensions such as space and time, supporting the view that they learn not merely superficial statistics, but literal world models.

Contents

TL;DR


  1. 대규모 언어모델의 공간적 및 시간적 표현 학습
  2. 선형 회귀 프로브를 통한 공간 및 시간 좌표 예측
  3. 모델의 내부 활성화를 기반으로 한 실제 위치 및 시간 추정

1 서론

현대의 대규모 언어모델들은 다음 토큰을 예측하는 훈련에도 불구하고 인상적인 능력을 보여주었습니다. 이런 모델들이 실제로 어떤 데이터를 학습하고 있는지에 대한 의문이 제기되었습니다. 이에 대한 한 가설은 대규모 언어모델들이 데이터에서 많은 상관관계를 학습하지만, 실제로는 데이터를 생성하는 기본 과정에 대한 이해가 없다는 것입니다. 반면, 다른 가설은 이런 모델들이 데이터를 압축하는 과정에서 더 간결하고 일관성 있는 생성 모델을 학습한다는 것입니다. 이 연구에서는 대규모 언어모델이 공간과 시간에 대한 모델을 형성하는지 그리고 어떻게 형성하는지를 탐구합니다.


2 실험 개요

2.1 공간 및 시간 데이터셋

연구를 위해 여러 공간적 및 시간적 스케일을 포괄하는 여섯 개의 데이터셋을 구축했습니다. 이 데이터셋들은 세계, 미국, 뉴욕시의 위치, 역사적 인물의 사망 연도, 예술 및 엔터테인먼트의 출시일, 뉴스 헤드라인의 출판일 등을 포함합니다. Llama-2 모델과 Pythia 모델을 사용하여 이런 장소와 사건의 이름에 대한 내부 활성화를 기반으로 실제 세계 위치(위도/경도) 또는 시간(숫자 타임스탬프)을 예측하기 위해 선형 회귀 프로브를 훈련했습니다.

2.2 모델 및 방법

모든 실험은 Llama-2 시리즈의 자기 회귀 트랜스포머 언어 모델을 사용하여 수행되었습니다. 각 데이터셋에 대해 모델을 실행하여 마지막 엔티티 토큰의 hidden state 활성화를 저장했습니다. 이 데이터는 선형 회귀 프로브를 사용하여 공간 및 시간 표현을 찾는 데 사용되었습니다.

2.3 평가

프로브의 성능을 평가하기 위해 R2 및 스피어만 순위 상관 관계와 같은 표준 회귀 메트릭을 사용했습니다. 추가적으로 각 예측의 근접 오류를 계산하여 공간 데이터의 절대 오류 메트릭을 분석했습니다.


3 공간 및 시간의 선형 모델

3.1 존재성

모델이 시간과 공간을 어떻게 표현하는지, 내부에서 어떤 위치에서 표현하는지, 그리고 모델 크기에 따라 표현 품질이 어떻게 달라지는지를 조사했습니다. 결과는 모델의 앞부분에서 품질이 점진적으로 향상되다가 중간 지점에서 정체되는 일관된 패턴을 보여주었습니다.

3.2 선형 표현

선형 회귀 프로브와 표현력이 더 높은 비선형 MLP 프로브를 비교하여 공간 및 시간 특징이 선형적으로 표현될 수 있는지를 테스트했습니다. 결과는 선형 프로브가 비선형 프로브보다 더 나은 성능을 보여주지 않아 공간 및 시간이 선형적으로 디코딩 가능함을 강력하게 시사했습니다.

3.3 프롬프팅에 대한 민감성

다양한 프롬프트를 사용하여 공간 및 시간 특징이 프롬프팅에 얼마나 민감한지를 조사했습니다. 명시적인 프롬프팅이 성능에 큰 차이를 만들지 않는 것으로 나타났습니다.


4 견고성 검증

4.1 일반화 검증

모델이 공간을 단순히 국가 멤버십으로만 표현하지 않고 실제로 공간을 표현하는지를 구분하기 위해 특정 데이터 블록을 보류하면서 프로브의 일반화를 분석했습니다. 결과는 일반화 성능이 떨어지지만 완전히 무작위보다는 나은 것으로 나타났습니다.

4.2 차원 축소

활성화 데이터셋을 주성분으로 투영하여 훈련한 프로브의 성능을 분석하여 모델이 공간 및 시간을 명시적으로 표현한다는 추가적인 증거를 제공했습니다.

5 공간 및 시간 뉴런

개별 뉴런을 찾아내어 이 뉴런들이 모델이 공간 및 시간 특징을 실제로 사용하고 있음을 보여주었습니다. 이런 뉴런은 데이터셋의 모든 엔티티 유형에 대해 민감하게 반응했습니다.

6 관련 작업

언어 모델이 지리적 정보를 인코딩하는 방법과 이를 어떻게 사용하는지에 대한 이전 연구들을 검토했습니다. 이와 관련된 작업으로는 체스 및 오델로 게임에서의 표현 학습이 있습니다.

7 토론

LLM이 공간 및 시간을 효과적으로 학습하고 이를 사용하는 방법을 설명하고, 이런 기본 요소가 더 포괄적인 인과적 세계 모델을 형성하는 데 어떻게 기여하는지에 대한 가설을 제시했습니다. 또한, 모델이 이런 표현을 어떻게 학습하고 사용하는지에 대한 미래의 연구 방향을 제안했습니다.


1 INTRODUCTION

Despite being trained to just predict the next token, modern large language models (LLMs) have demonstrated an impressive set of capabilities (Bubeck et al., 2023; Wei et al., 2022), raising questions and concerns about what such models have actually learned. One hypothesis is that LLMs learn a massive collection of correlations but lack any coherent model or “understanding” of the underlying data generating process given text-only training (Bender & Koller, 2020; Bisk et al., 2020). An alternative hypothesis is that LLMs, in the course of compressing the data, learn more compact, coherent, and interpretable models of the generative process underlying the training data, i.e., a world model. For instance, Li et al. (2022) have shown that transformers trained with next token prediction to play the board game Othello learn explicit representations of the game state, with Nanda et al. (2023) subsequently showing these representations are linear. Others have shown that LLMs track boolean states of subjects within the context (Li et al., 2021) and have representations that reflect perceptual and conceptual structure in spatial and color domains (Patel & Pavlick, 2021; Abdou et al., 2021). Better understanding of if and how LLMs model the world is critical for reasoning about the robustness, fairness, and safety of current and future AI systems (Bender et al., 2021; Weidinger et al., 2022; Bommasani et al., 2021; Hendrycks et al., 2023; Ngo et al., 2023).

In this work, we take the question of whether LLMs form world (and temporal) models as literally as possible—we attempt to extract an actual map of the world! While such spatiotemporal representations do not constitute a dynamic causal world model in their own right, having coherent multi-scale representations of space and time are basic ingredients required in a more comprehensive model.

Specifically, we construct six datasets containing the names of places or events with corresponding space or time coordinates that span multiple spatiotemporal scales: locations within the whole world, the United States, and New York City in addition to the death year of historical figures from the past 3000 years, the release date of art and entertainment from 1950s onward, and the publication date of news headlines from 2010 to 2020. Using the Llama-2 (Touvron et al., 2023) and Pythia Biderman et al. (2023) family of models, we train linear regression probes (Alain & Bengio, 2016; Belinkov, 2022) on the internal activations of the names of these places and events at each layer to predict their real-world location (i.e., latitude/longitude) or time (numeric timestamp).

These probing experiments reveal evidence that models build spatial and temporal representations throughout the early layers before plateauing at around the model halfway point with larger models

Figure 1: Spatial and temporal world models of Llama-2-70b. Each point corresponds to the layer 50 activations of the last token of a place (top) or event (bottom) projected on to a learned linear probe direction. All points depicted are from the test set.

consistently outperforming smaller ones (§3.1). We then show these representations are (1) linear, given that nonlinear probes do not perform better (§3.2), (2) fairly robust to changes in prompting (§3.3), and (3) unified across different kinds of entities (e.g. cities and natural landmarks). We then conduct a series of robustness checks to understand how our probes generalize across different data distributions (§4.1) and how probes trained on the PCA components perform (§4.2). Finally, we use our probes to find individual neurons which activate as a function of space or time, providing strong evidence that the model is truly using these features (§5).

2 EMPIRICAL OVERVIEW

2.1 SPACE AND TIME RAW DATASETS

To enable our investigation, we construct six datasets of names of entities (people, places, events, etc.) with their respective location or occurrence in time, each at a different order of magnitude of scale. For each dataset, we included multiple types of entities, e.g., both populated places like cities and natural landmarks like lakes, to study how unified representations are across different object types. Furthermore, we maintain or enrich relevant metadata to enable analyzing the data with more detailed breakdowns, identify sources of train-test leakage, and support future work on factual recall within LLMs. We also attempt to deduplicate and filter out obscure or otherwise noisy data.

Space We constructed three datasets of place names within the world, the United States, and New York City. Our world dataset is built from raw data queried from DBpedia Lehmann et al. (2015). In particular, we query for populated places, natural places, and structures (e.g. buildings or infrastructure). We then match these against Wikipedia articles, and filter out entities which do not have at least 5,000 page views over a three year period. Our United States dataset is constructed from DBPedia and a census data aggregator, and includes the names of cities, counties, zipcodes, colleges, natural places, and structures where sparsely populated or viewed locations were similarly filtered out. Finally, our New York City dataset is adapted from the NYC OpenData points of interest dataset (NYC OpenData, 2023) containing locations such as schools, churches, transportation facilities, and public housing within the city.

Time Our three temporal datasets consist of (1) the names and occupations of historical figures who died between 1000BC and 2000AD adapted from (Annamoradnejad & Annamoradnejad, 2022); (2) the titles and creators of songs, movies, and books from 1950 to 2020 constructed from DBpedia with the Wikipedia page views filtering technique; and (3) New York Times news headlines from 2010-2020 from news desks that write about current events, adapted from (Bandy, 2021).

Table 1: Entity count and representative examples for each of our datasets.

2.2 MODELS AND METHODS

Data Preparation All of our experiments are run with the base Llama-2 (Touvron et al., 2023) series of auto-regressive transformer language models, spanning 7 billion to 70 billion parameters. For each dataset, we run every entity name through the model, potentially prepended with a short prompt, and save the activations of the hidden state (residual stream) on the last entity token for each layer. For a set of n entities, this yields an n × dmodel activation dataset for each layer.

Probing To find evidence of spatial and temporal representations in LLMs, we use the standard technique of probing Alain & Bengio (2016); Belinkov (2022), which fits a simple model on the network activations to predict some target label associated with labeled input data. In particular, given an activation dataset A ∈ Rn×dmodel , and a target Y containing either the time or twodimensional latitude and longitude coordinates, we fit linear ridge regression probes yielding a linear predictor ˆY = A ˆW . High predictive performance on out-of-sample data indicates that the base model has temporal and spatial information linearly decodable in its representations, although this does not imply that the model actually uses these representations (Ravichander et al., 2020). In all experiments, we tune λ using efficient leave-out-out cross validation (Hastie et al., 2009) on the probe training set.

2.3 EVALUATION

To evaluate the performance of our probes we report standard regression metrics such as R2 and Spearman rank correlation on our test data (correlations averaged over latitude and longitude for spatial features). An additional metric we compute is the proximity error for each prediction, defined as the fraction of entities predicted to be closer to the target point than the prediction of the target entity. The intuition is that for spatial data, absolute error metrics can be misleading (a 500km error for a city on the East Coast of the United States is far more significant than a 500km error in Siberia), so when analyzing errors per prediction, we often report this metric to account for the local differences in desired precision.

3 LINEAR MODELS OF SPACE AND TIME

3.1 EXISTENCE

We first investigate the following empirical questions: do models represent time and space at all? If so, where internally in the model? Does the representation quality change substantially with model scale? In our first experiment, we train probes for every layer of Llama-2-{7B, 13B, 70B} and Pythia-{160M, 410M, 1B, 1.4B, 2.8B, 6.9B} for each of our space and time datasets. Our main In particular, both results, depicted in Figure 2, show fairly consistent patterns across datasets.

Figure 2: Out-of-sample R2 for linear probes trained on every model, dataset, and layer.

spatial and temporal features can be recovered with a linear probe, these representations smoothly increase in quality throughout the first half of the layers of the model before reaching a plateau, and the representations are more accurate with increasing model scale. The gap between the Llama and Pythia models is especially striking, and we suspect is due to the large difference in pre-training corpus size (2T and 300B tokens respectively). For this reason, we report the rest of our results on just the Llama models.

The dataset with the worst performance is the New York City dataset. This was expected given the relative obscurity of most of the entities compared with other datasets. However, this is also the dataset where the largest model has the best relative performance, suggesting that sufficiently large LLMs could eventually form detailed spatial models of individual cities.

3.2 LINEAR REPRESENTATIONS

Within the interpretability literature, there is a growing body of evidence supporting the linear representation hypothesis that features within neural networks are represented linearly, that is, the presence or strength of a feature can be read out by projecting the relevant activation on to some feature vector (Mikolov et al., 2013b; Olah et al., 2020; Elhage et al., 2022b). However, these results are almost always for binary or categorical features, unlike the continuous features of space or time.

Table 2: Out-of-sample R2 of linear and nonlinear (one layer MLP) probes for all models and features at 60% layer depth.

Figure 3: Out-of-sample R2 when entity names are included in different prompts for Llama-2-70b.

To test whether spatial and temporal features are represented linearly, we compare the performance of our linear ridge regression probes with that of substantially more expressive nonlinear MLP probes of the form W2ReLU(W1x + b1) + b2 with 256 neurons. Table 2 reports our results and shows that using nonlinear probes results in minimal improvement to R2 for any dataset or model. We take this as strong evidence that space and time are also represented linearly (or at the very least are linearly decodable), despite being continuous.

3.3 SENSITIVITY TO PROMPTING

Another natural question is if these spatial or temporal features are sensitive to prompting, that is, can the context induce or suppress the recall of these facts? Intuitively, for any entity token, an autoregressive model is incentivized to produce a representation suitable for addressing any future possible context or question.

To study this, we create new activation datasets where we prepend different prompts to each of the entity tokens, following a few basic themes. In all cases, we include an “empty” prompt containing nothing other than the entity tokens (and a beginning of sequence token). We then include a prompt which asks the model to recall the relevant fact, e.g., “What is the latitude and longitude of ” or “What was the release date of ’s .” For the United States and NYC datasets we also include versions of these prompts asking where in the US or NYC this location is, in an attempt to disambiguate common names of places (e.g. City Hall). As a baseline we include a prompt of 10 random tokens (sampled for each entity). To determine if we can obfuscate the subject, for some datasets we fully capitalize the names of all entities. Lastly, for the headlines dataset, we try probing on both the last token and on a period token appended to the headline.

We report results for the 70B model in Figure 3 and all models in Figure 8. We find that explicitly prompting the model for the information, or giving disambiguation hints like that a place is in the US or NYC, makes little to no difference in performance. However, we were surprised by the degree to which random distracting tokens degrades performance. Capitalizing the entities also degrades performance, though less severely and less surprisingly, as this likely interferes with “detokenizing” the entity (Elhage et al., 2022a; Gurnee et al., 2023; Geva et al., 2023). The one modification that did notably improve performance is probing on the period token following a headline, suggesting that periods are used to contain some summary information of the sentences they end.

4 ROBUSTNESS CHECKS

The previous section has shown that the true point in time or space of diverse types of events or locations can be linearly recovered from the internal activations of the mid-to-late layers of LLMs. However, this does not imply if (or how) a model actually uses the feature direction learned by the probe, as the probe itself could be learning some linear combination of simpler features which are actually used by the model.

4.1 VERIFICATION VIA GENERALIZATION

Block holdout generalization To illustrate a potential issue with our results, consider the task of representing the full world map. If the model has, as we expect it does, an almost orthogonal binary feature for is in country X, then one could construct a high quality latitude (longitude) probe by summing these orthogonal feature vectors for each country with coefficient equal to the latitude (longitude) of that country. Assuming a place is in only one country, such a probe would place each entity at its country centroid. However, in this case, the model does not actually represent space, only country membership, and it is only the probe which learns the geometry of the different countries from the explicit supervision.

To better distinguish these cases, we analyze how the probes generalize when holding out specific blocks of data. In particular, we train a series of probes, where for each one, we hold out one country, state, borough, century, decade, or year for the world, USA, NYC, historical figure, entertainment, and headlines dataset respectively. We then evaluate the probes on the held out block of data. In Table 3, we report the average proximity error for the block of data when completely held out, compared to the error of the test points from that block in the default train-test split, averaged over all held out blocks.

We find that while generalization performance suffers, especially for the spatial datasets, it is clearly better than random. By plotting the predictions of the held out states or countries in Figures 11 and 12, a qualitatively clearer picture emerges. That is, the probe correctly generalizes by placing the points in the correct relative position (as measured by the angle between the true and predicted centroid) but not in their absolute position. We take this as weak evidence that the probes are extracting explicitly learned features by the model, but are memorizing the transformation from model coordinates to human coordinates. However, this does not fully rule out the underlying binary features hypothesis, as there could be a hierarchy of such features that do not follow country or decade boundaries.

Table 3: Average proximity error across blocks of data (e.g., countries, states, decades) when included in the training data compared to completely held out. Random performance is 0.5.

Cross entity generalization Implicit in our discussion so far is the claim that the model represents the space or time coordinates of different types of entities (like cities or natural landmarks) in a unified manner. However, similar to the concern that a latitude probe could be a weighted sum of membership features, a latitude probe could also be the sum of different (orthogonal) directions for the latitudes of cities and for the latitudes of natural landmarks.

Similar to the above, we distinguish these hypotheses by training a series of probes where the traintest split is performed to hold out all points of a particular entity class.1 Table 4 reports the proximity error for the entities in the default test split compared to when heldout, averaged over all such splits as before. The results suggest that the probes largely generalize across entity types, with the main exception of the entertainment dataset.2

Dataset Model Entity World USA NYC Historical Entertainment Headlines
Llama-2-7b nominal 0.120 0.151 0.117 0.147 0.113 0.147
  held out 0.206 0.262 0.197 0.259 0.173 0.203
Llama-2-13b nominal 0.313 0.367 0.310 0.377 0.266 0.322
  held out 0.164 0.168 0.153 0.159 0.149 0.149
Llama-2-70b nominal 0.224 0.305 0.207 0.283 0.159 0.271
  held out 0.199 0.289 0.171 0.266 0.144 0.219

Table 4: Average proximity error across entity subtypes (e.g. books and movies) when included in the training data compared to being fully held out. Random performance is 0.5.

4.2 DIMENSIONALITY REDUCTION

Despite being linear, our probes still have dmodel learnable parameters (ranging from 4096 to 8192 for the 7B to 70B models), enabling it to engage in substantial memorization. As a complementary form of evidence to the generalization experiments, we train probes with 2 to 3 orders of magnitude fewer parameters by projecting the activation datasets onto their k largest principal components. Figure 4 illustrates the test R2 for probes trained on each model and dataset over a range of k values, as compared to the performance of the full dmodel-dimensional probe. We also report the test Spearman correlation in Figure 13 which increases much more rapidly with increasing k than the R2. Notably, the Spearman correlation only depends on the rank order of the predictions while R2 also depends on their actual value. We view this gap as further evidence that the model explicitly represents space and time as these features must account for enough variance to be in the top dozen principal components, but that the probe requires more parameters to convert from the model’s coordinate system to literal spatial coordinates or timestamps. We also observed that the first several principal components clustered the different entity types within the dataset, explaining why more than a few are needed.

Figure 4: Test R2 for probes trained on activations projected onto k largest principal components for each dataset and model compared to training on the full activations.

1 We only do this entities which do not make up the majority of the training data (e.g., as is the case with populated places for the world dataset and songs for the entertainment dataset) which is partially responsible for the discrepancies in the nominal cases for Tables 3 and 4.

2 We note in this case the Spearman correlation is still high, suggesting this is an issue with bias generalization, as the different entity types are not uniformly distributed in time.

Figure 5: Space and time neurons in Llama-2 models. Depicts the result of projecting activation datasets onto neuron weights compared to true space or time coordinates with Spearman correlation by entity type.

5 SPACE AND TIME NEURONS

While the previous results are suggestive, none of our evidence directly shows that the model uses the features learned by the probe. To address this, we search for individual neurons with input or output weights that have high cosine similarity with the learned probe direction. That is, we search for neurons which read from or write to a direction similar to the one learned by the probe.

We find that when we project the activation datasets on to the weights of the most similar neurons, these neurons are indeed highly sensitive to the true location of entities in space or time (see Figure 5). In other words, there exist individual neurons within the model that are themselves fairly predictive feature probes. Moreover, these neurons are sensitive to all of the entity types within our datasets, providing stronger evidence for the claim these representations are unified.

If probes trained with explicit supervision are an approximate upper bound on the extent to which a model represents these spatial and temporal features, then the performance of individual neurons is a lower bound. In particular, we generally expect features to be distributed in superposition (Elhage et al., 2022b), making individual neurons the wrong level of analysis. Nevertheless, the existence of these individual neurons, which received no supervision other than from next-token prediction, is very strong evidence that the model has learned and makes use of spatial and temporal features.

We also perform a series of neuron ablation and intervention experiments in Appendix B to verify the importance of these neurons in spatial and temporal modeling.

Linguistic Spatial Models Prior work has shown that natural language encodes geographic information (Louwerse & Zwaan, 2009; Louwerse & Benesh, 2012) and that relative coordinates can be approximately recovered with simple techniques like multidimensional scaling, co-occurrence statistics, or probing word embeddings (Louwerse & Zwaan, 2009; Mikolov et al., 2013a; Gupta et al., 2015; Konkol et al., 2017). However, these studies only consider a few hundred well known cities and obtain fairly weak correlations. Most similar to our work is (Li´etard et al., 2021) who probe word embeddings and small language models for the coordinates of global cities and whether countries share a border, but conclude the amount of geographic information learned is “limited,” likely because the largest model they study was 345M parameters (500x smaller than Llama 70B).

Neural World Models We consider a spatiotemporal model to be a necessary ingredient within a larger world model. The clearest evidence that such models are learnable from next-token prediction comes from GPT-style models trained on chess (Toshniwal et al., 2022) and Othello games (Li et al., 2022) which were shown to have explicit representations of the board and game state, with further work showing these representations are linear (Nanda et al., 2023). In true LLMs, Li et al. (2021) show that an entity’s dynamic properties or relations can be linearly read out from representations at different points in the context. Abdou et al. (2021) and Patel & Pavlick (2021) show LLMs have representations that reflect perceptual and conceptual structure in color and spatial domains.

Factual Recall The point in time or space of an event or place is a particular kind of fact. Our investigation is informed by prior work on the mechanisms of factual recall in LLMs (Meng et al., 2022a;b; Geva et al., 2023) indicating that early-to-mid MLP layers are responsible for outputting information about factual subjects, typically on the last token of the subject. Many of these works also show linear structure, for example in the factuality of a statement (Burns et al., 2022) or in the structure of subject-object relations (Hernandez et al., 2023). To our knowledge, our work is unique in considering continuous facts.

Interpretability More broadly, our work draws upon many results and ideas from the interpretability literature (R¨auker et al., 2023), especially in topics related to probing (Belinkov, 2022), BERTology (Rogers et al., 2021), the linearity hypothesis and superposition (Elhage et al., 2022b), and mechanistic interpretability (Olah et al., 2020). More specific results related to our work include Hanna et al. (2023) who find a circuit implementing greater-than in the context of years, and Goh et al. (2021) who find “region” neurons in multimodal models that resemble our space neurons.

7 DISCUSSION

We have demonstrated that LLMs learn linear representations of space and time that are unified across entity types and fairly robust to prompting, and that there exists individual neurons that are highly sensitive to these features. We conjecture, but do not show, these basic primitives underlie a more comprehensive causal world model used for inference and prediction.

Our analysis raises many interesting questions for future work. While we showed that it is possible to linearly reconstruct a sample’s absolute position in space or time, and that some neurons use these probe directions, the true extent and structure of spatial and temporal representations remain unclear. We conjecture that the most canonical form of this structure is a discretized hierarchical mesh, where any sample is represented as a linear combination of its nearest basis points at each level of granularity. Moreover, the model can and does use this coordinate system to represent absolute position using the correct linear combination of basis directions in the same way a linear probe would. We expect that as models scale, this mesh is enhanced with more basis points, more scales of granularity (e.g. neighborhoods in cities), and more accurate mapping of entities to model coordinates (Michaud et al., 2023). This suggests future work on extracting representations in the model’s coordinate system rather than trying to reconstruct human interpretable coordinates, perhaps with sparse autoencoders (Cunningham et al., 2023).

We also barely scratched the surface of understanding how these spatial and temporal models are learned, recalled, and used internally, or to what extent these representations exist within a more comprehensive world model. By looking across training checkpoints, it may be possible to localize a point in training when a model organizes constituent is in place X features into a coherent geometry or else conclude this process is gradual (Liu et al., 2021). We expect that the model components which construct these representations are similar or identical to those for factual recall (Meng et al., 2022a; Geva et al., 2023).

Finally, we note that the representation of space and time has received much more attention in biological neural networks than artificial ones (Buzs´aki & Llin´as, 2017; Schonhaut et al., 2023). Place and grid cells (O’Keefe & Dostrovsky, 1971; Hafting et al., 2005) in particular are among the most well-studied in the brain and may be a fruitful source of inspiration for future work on LLMs.

Previous: Model | Kosmos-2.5 Next: Hyper Attention

post contain ""

    No matching posts found containing ""