MacIntyrean Rationality of Traditions and Machine Learning

 

NB Page under construction - Very preliminary notes

Notes on what a MacInyrean analysis of the rivalry between traditions in machine learning might yield. The analysis.

Popper vs Logical Empiricism (Vienna Circle+Berliners) (+ Bayesian successors)

Political context to their work, unsurprisingly as from early 20th century Mitteleuropa. Comparable philosophies. Both sides interested in demarcation. One major difference is focus on confirmation/induction and falsification. For Popper there is no such thing as induction, generalisations cannot be proved, only falsified. Don't seek out probable hypothesis as these aren't risky, they're safe, they don't exclude many possible events. Contrast the daring Einstein who sticks out his neck and predicts the bending of light, with a psychoanalyst such as Adler who is able to explain absolutely any kind of behaviour.

Popper against probabilistic accounts of confirmation, the space of theories is large enough that their priors must be zero if they say anything empirical. Applying Bayes' formula will leave them zero.

In the context of statistical learning: Vapnik vs Jaynes (Neal)

Jaynes talks about a Robot with specific background information, so we might include him as a machine learning theorist. Jaynes is quite rude about philosopher Bayesians, with the exception of Ramsey (died at 26).

Vapnik, on the hand, is full of praise for Popper:

"...obtaining uniform one-sided convergence using uniform two-sided convergence is not only a technical detail. To find these conditions, it is necessary to construct a mathematical generalization of one of the most impressive ideas in the philosophy of science-the idea of nonfalsifiability..." (p. 106)

That's a lot of manhours for this one idea! Popper advises that we propose daring conjectures, i.e., ones which are as falsifiable as possible. They must also be as well-corroborated as their predecessor theory. Corroboration is the passing of tests, as severe as possible.

Popper uses 'Bewaehrung' for corroboration, while he sees confirmation as closer to 'erhaerten' or 'bestaetigen'. He wishes to avoid the sense of opinions as 'firming up'

Popper has two methods of comparing theories:

1) containment relation between classes of falsifiers

determined by degree of universality and degree of precision (of predicate and of measurement). So "all planets move in ellipses" is less universal and less precise, and hence less falsifiable, than "all heavenly bodies move in circles".

2) dimension of theory: a forerunner of the VC-dimension?

"if there exists, for a theory t, a field of singular (but not necessarily basic) statements such that, for some number d, the theory cannot be falsified by any d-tuple of the field, although it can be falsified by certain d+1-tuples, then we call d the characteristic number of the theory with respect to that field. All statements of the field whose degree of composition is less than d, or equal to d, are then compatible with the theory, and permitted by it, irrespective of their content." (p. 130)

For any two theories it is possible that neither, one, or both methods of comparison apply. When both apply, it's possible that the dimensions agree, but that that one class of falsifiers is properly contained in the other. To what extent is dimension linked to freely determinable parameter?

"In algebraic representation, the dimension of a set of curves depends upon the number of parameters whoce values we may freely choose. We can therefore say that the number of freely determinable parameters of a set of curves by which a theory is represented is characteristic for the degree of falsifiability (or testability) of that theory." (p. 131)

So he might identify simplicity either with degree of falsifiability or number of parameters. But he comes down in the end in favour of the former:

"The epistemological questions which arise in connection with the concept of simplicity can all be answered if we equate this concept with degree of falsifiability." (p. 140)

"Above all, our theory explains why simplicity is so highly desirable. To understand this there is no need for us to assume a 'principle of economy of thought' or anything of the kind. Simple statements, if knowledge is our object, are to be prized more highly than simple ones because they tell us more; because their empirical content is greater; and because they are better testable." (p. 142)

I can't see any sign that he imagined situations where the number of parameters and dimension of theory deviated, but presumably he would have continued to follow the latter in cases where they diverge. However, sine oscillations are taken to be simple (p. 143), he gives no thought to the number of points needed to falsify this class.

The class of functions Asin (bx) + c is often held  as a counterexample to

 'A small number of parameters implies low VC-dimension'

but if a margin is required then won't there be some finite VC dimension depending on margin width and size of data? Like VC-dim of delta-margin separating hyperplanes? I guess the margin would put a limit on the size of b. Perhaps better to talk about the precision with which parameters need to be given.

The following attitude of Popper suggests that he doesn't foresee any of the developments of SLT in the direction of transduction, etc.

"...our methodological decision - sometimes metaphysically interpreted as the principle of causality - is to leave nothing unexplained, i.e. always to try to deduce statements from others of higher universality. This decision is derived from the demand for the highest attainable degree of universality and precision, and it can be reduced to the demand, or rule, that preference should be given to those theories which can be most severely tested." (p. 123)

There seems to be nothing about how reliable are well-corroborated theories, i.e., how much we can trust their predicitions:

"Like inductive logic in general, the theory of the probability of hypotheses seems to have arisen through a confusion of psychological with logical questions. Admittedly, our subjective feelings of conviction are of different intensities, and the degree of confidence with which we await the fulfilment of a prediction and the further corroboration of a hypothesis is likely to depend, among other things, upon the way in which this hypothesis has stood up to tests so far - upon its past corroboration. But that these psychological questions do not belong to epistemology or methodology is pretty well acknowledged even by believers in probability logic. (Note: I am alluding here to the school of Reichenbach rather than to Keynes.)" (p. 255)

There's a common criticism of Bayesianism that it leads to a kind of laziness, (e.g., Lakatos). Where the Popperian is always on the look out for experience that will falsify their beliefs, does anything stop the Bayesian locking themself up in a dark room to protect their degrees of belief?

Of course Jaynes disagrees, and was especially rude about Popper - of course there is induction in science, when it fails you learn something, your models aren't good.

Problems for Popper with Duhem-Quine thesis. T & A --> e, not e --> not T or not A. So when an experiment fails to deliver what is expected, one doesn't know whether to blame the theory, or auxiliary assumptions about the world, the equipment, etc. In statistical learning terms, there is no idea of soft margins.

Problems also that no scientist seemed to behave completely Popperianly. And some 'non-scientists', such as Freud, appeared to behave quite Popperianly. E.g., Freud gives up seduction hypothesis (all neuroses are caused by early seductions) in view of its predictions (seductions he knew hadn't happened). Later Popper stresses the direction that theory change must take. It must get ever more falsifiable. Never after refutation of "All planets behave..." should you say "All planets, except X, behave...". Becomes a comparison of one theory with its successor. The historical dimension. So Freud's replacement of Seduction Hypothesis by notion of seduction happening in 'psychic reality', is seen as a response to falsification whose outcome is a less falsifiable theory.

Both Popper and Logical Empiricism were damaged by the thesis that data doesn't come in a raw form, it's mediated by the theories and instruments, themselves embodied theories. (Models mediating between theory and data are now a hot topic.) Leads to Kuhn, paradigms, revolutions, indoctrination. Indoctrination is good, without it you'd never look closely enough at a theory to find (unintentionally) its anomalies. Anomalies build up, rival theories are proposed, but there is no moment when it is rational to jump ship. Kuhn uses phrase such as 'Gestalt switch' and 'the scales falling from one's eyes'.

Lakatos worries that Kuhn is advocating 'mob psychology', and tries to find an improved Popperian account. Contra Popper he claims that theories are born refuted, Newton fails as a Popperian. The right unit to assess a piece of science at is at the level of a research programme, a series of theories, with a unifying heuristic spirit which provides the resources for deciding which path to travel, how to react to obstacles, and so on. Rationality is not about which proposition to believe, but about which programme it's rational to sign up to. Which is progressing, which degenerating. Criteria: heuristic, theoretical, empirical progress.

MacIntyre: these criteria cannot work if they are taken to be employable by people from outside the programme, the neutral standpoint is an Enlightenment dream. Theories "...progress or fail to progress and they do so because and insofar as they provide by their incoherences and their inadequacies - incoherences and inadequacies judged by the standards of body of theory itself - a definition of problems, the solution of which provides direction for the formulation and reformulation of that body of theory."

MacIntyre comes from moral philosophy, where disagreements run very deep. For example, in the 12-13th centuries debates between Aristotelians and Augustinians were conducted in the University of Paris and elsewhere. Clearly, there is much less of a shared background as to what constitutes our sovereign good, than between say a Newtonian and an Einsteinian as to what a cosmological theory should look like, enough to dictate that something as specific as accounting for Mercury's behaviour be a common goal.

Enquiry and virtues:

A living tradition then is an historically extended, socially embodied argument, and an argument precisely in part about the goods which constitute that tradition. Within a tradition the pursuit of goods extends through generations, sometimes through many generations. Hence the individual's search for his or her good is generally and characteristically conducted within a context defined by those traditions of which the individuals life is a part, and this is true both of those goods which are internal to practices and of the goods of a single life. Once again the narrative phenomenon of embedding is crucial: the history of a practice in our time is generally and characteristically embedded in and made intelligible in terms of the larger and longer history of the tradition through which the practice in its present form was conveyed to us; the history of each of our own lives is generally and characteristically embedded in and made intelligible in terms of the larger and longer histories of a number of traditions. I have to say 'generally and characteristically' rather than 'always', for traditions decay, disintegrate and disappear. What then sustains and strengthens traditions? What weakens and destroys them?

The answer in key part is: the exercise or the lack of exercise of the relevant virtues. The virtues find their point and purpose not only in sustaining those relationships necessary if the varieties of goods internal to practices are to be achieved and not only in sustaining the form of an individual life in which that individual may seek out his or her good as the good of his or her whole life, but also in sustaining those traditions which provide both practices and individual lives with their necessary historical context. Lack of justice, lack of truthfulness, lack of courage, lack of the relevant intellectual virtues - these corrupt traditions, just as they do those institutions and practices which derive their life from the traditions of which they are the contemporary embodiments. To recognize this is of course also to recognize the existence of an additional virtue, one whose importance is perhaps most obvious when it is least present, the virtue of having an adequate sense of the traditions to which one belongs or which confront one. This virtue is not to be confused with any form of conservative antiquarianism; I am not praising those who choose the conventional conservative role of laudator temporis acti. It is rather the case that an adequate sense of tradition manifests itself in a grasp of those future possibilities which the past has made available to the present. Living traditions, just because they continue a not-yet-completed narrative, confront a future whose determinate and determinable character, so far as it possesses any, derives from the past. (After Virtue: 222-3)

Not so far from Lakatos, shades of his notion of degenerating research programmes, but MacIntyre insists that to gauge the progress of a tradition you need to be trained in it, as criteria of success are specific to a tradition.

[...just because at any particular moment the rationality of a craft is justified by its history so far, which has made it what it is in that specific time, place, and set of historical circumstances, such rationality is inseparable from the tradition through which it was achieved. To share in the rationality of a craft requires sharing in the contingencies of its history, understanding its story as one's own, and finding a place for oneself as a character in the enacted dramatic narrative which is that story so far. The participant in a craft is rational qua participant insofar as he or she conforms to the best standards of reason discovered so far, and the rationality in which he or she thus shares is always, therefore, unlike the rationality of the encyclopaedic mode, understood as a historically situated rationality, even if one which aims at a timeless formulation of its own standards which would be their final and perfected form through a series of successive reformulations, past and yet to come. (Alasdair MacIntyre, Three Rival Versions of Moral Enquiry, Duckworth 1990: 65)}

This allows for a stronger form of incommensurability than Lakatos allows, without leading to a complete relativism, each participant acting according to the different rational standards of their own tradition, as we'll see below.

For Lakatos, rational theory choice is possible to the extent that an "internal history" or "rational reconstruction" can be formulated according to which one rival wins out over the other. This allows for a departure from actual history, which generally shows programmes to be incommensurable. One rational reconstruction is superior to another if it constitutes more of actual history as rational.

MacIntyre disagrees: "it matters enormously that histories should be true, just as it matters that our scientific theories make truth one of their goals." There is a lot to say about his notion of truth, which certainly isn't one of correspondence to mind-independent facts. But rather than discuss this, let's return to machine learning.

Philosophy of Science and Machine Learning

General philosophy of science is dwindling. Some continuing work on Bayesianism, but not enough to thrill working learning theorists, and on causality which has had some impact on artificial intelligence via Pearl and Glymour. Focus has moved to realism. Elsewhere, philosophers of science have become specialised, but specificity of philosophy of population genetics or quantum field theory unlikely to be of much help, since these are theories with rich conceptual frameworks. Interest is in 'communicative rationality', establishing the best way to organise the principles of a field, rather than 'instrumental rationality', way to achieve fixed ends most efficiently.

Perhaps the right direction to look in is the other way. To what extent are these learning theories relevant to level at which philosophy of science operates. Jaynes does have something to say about Bayesianism and science. cf. 'Queer uses of probability theory', also Polya gives an analysis of plausible mathematical reasoning.

In summary, Popper vs inductive logic as a debate in philosophy of science relevant to humans largely fails because neither offers much help for changes to the larger conceptual framework. Humans aren't just after truth, but significant truth. This leaves reasoning within a fixed framework, where significance is already guaranteed. Here, there has to be a notion of support, Popper uses corroboration as distinguished from confirmation. Later tried to develop a notion of verisimilitude. Generally agreed to be unsuccessful. Not so dissimilar to confirmation. Unlikely to be anything here better than expert learning theorists of whichever party can tell you.

How about we consider what philosophy of science has to say about learning theory as a science.

Vapnik sees learning theory as a science: "The learning problem belongs to the problems of natural science" (298). Jaynes, "Probability Theory as Extended Logic", the logic of science.

It would seem to be difficult to self-apply their own theories. E.g., we don't use SLT techniques to learn new SLT techniques.

Up a level, which are behaving most logically empirically or Popperly? Do they seek to falsify their theories, or to confirm them? Some assessment is done in terms of performance on tests, but there are disputes about rules for tests. If you hold the record on US postal data, what does this show? One side may say that being the most accurate classifier is not the ultimate goal. Even if both sides could agree to the test. There's a kind of Duhem-Quine problem. There's a space between principles, algorithms and implementation. Bayesians might say, for example, we weren't acting in principled way, our algorithms haven't been developed enough yet, hardware is still too slow. If things don't work out, there are plenty of places to point the finger of blame.

Then there's always the thought that who's working the algorithms is important. "Radford Neal using SVMs would outperform the opposition".

Yet on battle with frequentist statistics, Jaynes says: 'One can argue with a philosophy: it's not so easy with a computer printout, which says to us: "Independently of all your philosophy, here are the facts of actual performance.' Such comments, we see, should be treated with caution.

More promising is an assessment of which group is producing the most dynamic ideas, most productive problem shifts, i.e., which has the greater resources. Best chance of philosophy of science being of use is through MacIntyrean analysis of the rivalry.

The idea of tradition-based rationality could be used to describe the Popper/Bayesian debate as regards machine learning. Note that it is not necessary that one be a clear winner at any moment.

Agenda: Provide the context for the debate. Remind both sides that there's no spot rationality to decide which of the rivals it is most rational to join, but that we can strive to give the best ongoing assessment of their relative strengths. Even then there may be no winner. Ideally, there would be an account of what is the common ground between rivals, then a recognition that each tradition has its own criteria to decide progress. What we can expect of each rival is a clear statement of its principles, what it considers to be the path by which it overcame obstacles, which are its greatest successes and what in its terms are the largest open problems confronting it. Also, we need an account of what it takes to be the strengths of the rival, and whether it can understand these in its own terms. And the weaknesses of the rival and how it understands why they should arise.

Ought to encourage some members to learn other language as a second language, or even a second first language.

Outcomes: no result, pressure on rival, surpassing, merging. There is no problem with the co-existence of rival traditions. Indeed, conflict should be seen as an opportunity to rethink one's own principles, a chance for a form of falsification, potentially leading to a creative reformulation, in sum an opportunity which should be taken. In some ways the promotion of this form of rationality is not so much one of trying to beat the other side, but rather one of holding a mirror up. The other party might claim that your mirror is distorting, but they might also have a moment of insight into failing they did not realise they had.

My work with mathematics higher dimensional algebra community tries to do something similar. Admittedly there is a more nebulous opposition, who operate by, say, anonymously rejecting articles. Philosophy of mathematics missed out on its Kuhn phase, now it's in a deadening realism/nominalism phase. I have tried to revive it by looking at a range of new problems. Seehere.

To some extent philosophers have taken part in debates between rival traditions, but typically as signed up representatives of sides. Then, there is little acknowledgement of own side's weaknesses.

One large problem is in identifying what you mean by a certain tradition. Of course, there are differences of opinion amongst statistical learning theorists or amongst Bayesians, (e.g., concerning the role of MaxEnt). On the other hand, traditions transform themselves and split.

Back to Top

Back to my home page.

Last revised: February 15, 2005.