NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic) (transformer-circuits.pub)
bob1029 21 hours ago [-]
> Deep learning models produce their outputs using a series of transformations distributed across many computational units (artificial “neurons”). The field of mechanistic interpretability seeks to describe these transformations in human-understandable language.

This is the central theme behind why I find techniques like genetic programming to be so compelling. You get interpretability by default. The second order effect of this seems to be that you can generalize using substantially less training data. The humans developing the model can look inside the box and set breakpoints, inspect memory, snapshot/restore state, follow the rabbit, etc.

The biggest tradeoff here being that the search space over computer programs tends to be substantially more rugged. You can't use math tricks to cheat the computation. You have to run every damn program end-to-end and measure the performance of each directly. However, you can execute linear program tapes very, very quickly on modern x86 CPUs. You can search through a billion programs with a high degree of statistical certainty in a few minutes. I believe we are at a point where some of the ideas from the 20th century are viable again.

radarsat1 15 hours ago [-]
For a complex enough problem (like next word prediction on arbitrary text), I really have my doubts that any such method will result in an "interpretable" solution. More likely you end up with a giant stack of indecipherable if statements, gotos, and random multiplications. And that's assuming no matrices are involved, introduce those and you've just got a non-differentiable, non-parallelizable neural network.
dekhn 12 hours ago [-]
Intepretability is nice, I guess, but what if the underlying latent model for a real-world system is not human-understandable. if a system provides interpretability by default, does it fail to build a model for a system that can't be interpreted? Personally I think the answer is, it still builds a model, but produces an interpretation that can't be understood by people.
esafak 16 hours ago [-]
Where do the features come from, feature engineering? That's the method that failed the bitter lesson. Why would you use genetic programming when you can do gradient descent?
bob1029 16 hours ago [-]
> Where do the features come from, feature engineering? That's the method that failed the bitter lesson.

That would be the whole point of genetic programming. You don't have to do feature engineering at all.

Genetic programming is a more robust interpretation of the bitter lesson than transformer architecture and DNNs. You have less clever tricks you need to apply to get the job done. It is more about unmitigated raw compute than anything out there.

In my experiments, there are absolutely zero transformation, feature engineering, normalization, tokenization, etc. It is literally:

1. Copy input byte sequence to program data region

2. Execute program

3. Copy output byte sequence from program data region

Half of this problem is about how you search for the programs. The other half is about how you measure them. There isn't much other problem to worry about other than how many CPUs you have on hand.

esafak 15 hours ago [-]
Where does the genome, genetic representation, you are evolving come from? The same raw features you use in neural networks? Then you optimize using that? If so, why not use gradient descent, which is faster? And this is still a step behind neural networks even apart from the optimization method, because neural networks use composition to learn features. How are you doing that?

Do you have any real world examples of your method that are competitive with DL methods?

bob1029 15 hours ago [-]
> Where does the genome, genetic representation, you are evolving come from

The instruction set of the program that is being searched for.

This is probably the best publicly available summary of the idea I am pursuing:

https://github.com/kurtjd/brainfuck-evolved

esafak 15 hours ago [-]
A program is composed of arbitrarily many instructions of your set. How are you accounting for this; trying every possible program length? And you are considering the simpler case where the search space is discrete, unlike the continuous spaces in most machine learning problems.

I think you need to think this through some more. You may see there is a reason nobody uses genetic algorithms for real world tasks.

bob1029 14 hours ago [-]
> How are you accounting for this; trying every possible program length?

Part of the mutation function involves probabilistically growing and shrinking the program size (i.e., inserting and removing random instructions).

> And you are considering the simpler case where the search space is discrete, unlike the continuous spaces in most machine learning problems.

All "continuous spaces" that embody modern machine learning techniques are ultimately discrete.

esafak 14 hours ago [-]
No, they are not. Model outputs can be discretized but the model parameters (excluding hyperparameters) are typically continuous. That's why we can use gradient descent.
bob1029 14 hours ago [-]
Where are the model parameters stored and how are they represented?
esafak 14 hours ago [-]
In disk or memory as multidimensional arrays ("tensors" in ML speak).
bob1029 13 hours ago [-]
Do we agree that these memories consist of a finite # of bits?
esafak 13 hours ago [-]
Yes, of course.

Consider a toy model with just 1000 double (64-bit), or 64Kb parameters. If you're going to randomly flip bits over this 2^64K search space while you evaluate a nontrivial fitness function, genetic style, you'll be waiting for a long time.

bob1029 13 hours ago [-]
I agree if you approach it naively you will accomplish nothing.

With some optimization, you can evolve programs with search spaces of 10^10000 states (i.e., 10 unique instructions, 10000 instructions long) and beyond.

Visiting every possible combination is not the goal here.

dartos 15 hours ago [-]
you're talking about specifically using genetic programming to create new programs as opposed to gradient decend in LLMs to minimize a loss function, right?

How would you construct a genetic algorithm to produce natural language like LLMs do?

Forgive me if i'm misunderstanding, but in programming we have "tokens" which are minimal meaningful bits of code.

For natural languages it's harder. "Words" are not super meaningful on their own, i don't think. (at least not as much as a token) so how would you break down natural language for a genetic algorithm?

bob1029 14 hours ago [-]
> how would you break down natural language for a genetic algorithm?

The entire point is that you do not bother trying. From an information theory and computational perspective, raw UTF-8 bytes can work just as well as "tokens".

The program that is being evolved is expected to develop whatever strategy is best suited to providing the desired input/output transformation. Back to the bitter lesson on this one.

dartos 12 hours ago [-]
I’ll need to read up on genetic algorithms, I think.

That sounds really cool, but coming from training other statistical models, im having a hard time imagining what the training loop looks like.

wasabi991011 10 hours ago [-]
Seems interesting, do you have any place to read more?

I took a look at DEAP, but it seems to be more tree-based, where as you seem to be talking about "linear program tapes" which I know nothing about.

Also, it seems like the examples I find online of genetic programming are mostly discrete optimization, sometimes policy. The only classification problem that DEAP gave as an example was spambase, which uses pre-computed features (word frequencies) as the dataset (rather than the raw emails).

Can you describe linear program tapes a bit? And give an example of a machine learning task more similar to where DNN are used that would be amenable to GP without feature engineering?

JSR_FDED 17 hours ago [-]
I’m also intrigued by genetic programming. One of the benefits, if I understand correctly, is that it is more resistant to getting stuck in local maxima.
esafak 15 hours ago [-]
Overparameterized neural networks don't have that problem because there are no local maxima; there are many roads to Rome.
ironbound 22 hours ago [-]
For people new to this maybe check out this video, it explains how the internals run pretty quickly https://m.youtube.com/watch?v=UKcWu1l_UNw

In theory if Anthropic puts research into the mechanics of the models internals, we can get better returns in training and alignment.

somethingsome 21 hours ago [-]
Is the pdf available somewhere?
yorwba 20 hours ago [-]
The Transformer Circuits Thread is an HTML-only journal. Of course you can convert the content to PDF, but then you lose the interactive elements.
somethingsome 18 hours ago [-]
That's kind of worrying for perenity. I was hoping some export were available by default, even without the interactions. I don't care that much about interactions, I care more about the content. Web technologies come and go and are subject to change and break.
halayli 18 hours ago [-]
then print it as pdf
chromaton 12 hours ago [-]
The PDF conversions I've tried in Firefox and Chromium don't work that well.
20 hours ago [-]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 04:46:19 GMT+0000 (UTC) with Wasmer Edge.