Jump to content

Talk:Neural network (machine learning)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia


History section: request to approve edits of 15-16 September 2024

[edit]

As discussed with User:North8000, on 15-16 September 2024, I edited Neural network (machine learning). He reverted and wrote, "you are doing massive reassignment of credit for Neural Networks based on your interpretation of their work and primary sources and deleting secondary sourced assignments. Please slow down and take such major reassignments to talk first." So here. Please note that most of my edits are not novel! They resurrect important old references deleted on 7 August 2024 in a major edit when User:Cosmia Nebula (whose efforts I appreciate) tried to compress the text‎. This massive edit has remained unchallenged until now. I also fixed links in some of the old references, added a few new ones (both primary and secondary sources), corrected many little errors, and tried to streamline some of the explanations. IMO these edits restored important parts and further improved the history section of the article, although a lot remains to be done. Now I kindly ask User:North8000 and User:Cosmia Nebula who seem to know a lot about the subject: please review the details once more and revert the revert! Best regards, Speedboys (talk) 21:24, 17 September 2024 (UTC)[reply]

Recapping my response from our conversation at my talk page: Thanks for your work and your post. The series of rapid fire edits ended up being entangled where they an't be reviewed/ potentially reverted separately. In that bundle were several which IMO pretty creatively shifted/assigned credit for being the one to pioneer various aspect. So I'm not open to reinstating that whole linked bundle including those. Why not just slow down and put those things back in at a pace where they can be reviewed? And the the ones that are are a reach (transferring or assigning credit for invention) take to talk first. You are most familiar with the details of your edits and are in the best position to know those. Sincerely, North8000 (talk) 00:35, 18 September 2024 (UTC)[reply]

Hello @Speedboys
My main concern with the page were 1. It had too many details that probably should go into History of artificial neural networks. 2. It relies much on Schmidhuber's history, especially "Annotated History of Machine Learning", and Schmidhuber is an unreliable propagandist who bitterly contests priority with everyone else. He aims to show that modern deep learning is mostly originated by his team, or others like Lapa and Fukushima etc, specifically *not* LeCun, Bengio, etc. You can press ctrl+f and type "did not" and find phrases like "This work did not cite the earlier LSTM" "which the authors did not cite" "extremely unfair that Schmidhuber did not get the Turing award"...
It is even more revealing if you ctrl+f on "Hinton". More than half of the citations to Hinton are followed by "Very similar to [FWP0-2]", "although this type of deep learning dates back to Schmidhuber's work of 1991", "does not mention the pioneering works", "The authors did not cite Schmidhuber's original"... You can try the same exercise by ctrl+f on "LeCun" and "Bengio". It is very funny.
His campaign reached levels of absurdity when he claimed that Amari (1972)'s RNN is "based on the (uncited) Lenz-Ising recurrent architecture". If you can call the Ising model as "The first non-learning recurrent NN architecture", then I can call the heat death of the universe "The first non-evolving model of Darwinian evolution". The entire point of RNN is that it is dynamic, and the entire point of the Ising model is that it is about thermal equilibrium at a point where all dynamics has *stopped*.
As one example, the phrase "one of the most important documents in the history of machine learning" used to appear several times all across Wikipedia, and is an obvious violation of WP:NPOV, and it came straight from his "Annotated History of Machine Learning". I removed all examples of this phrase in Wikipedia except in his own page (he is entitled to his own opinions). In fact, the entire paper is scattered with such propagandistic sentences:
> [DEC] J. Schmidhuber (AI Blog, 02/20/2020, updated 2021, 2022). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The recent decade's most important developments and industrial applications based on the AI of Schmidhuber's team, with an outlook on the 2020s, also addressing privacy and data markets.
As a general principle, if I can avoid quoting Schmidhuber, I must, because Schmidhuber is extremely non-NPOV. I had removed almost all citations to his Annotated History except those that genuinely cannot be found anywhere else. For example, I kept all citations to that paper about Amari and Saito, because 1. H. Saito is so extremely obscure that if we don't cite Schmidhuber on this, we have no citation for this. 2. I can at least trust that he didn't make up the "personal communication" with Amari.
> [GD2a] H. Saito (1967). Master's thesis, Graduate School of Engineering, Kyushu University, Japan. Implementation of Amari's 1967 stochastic gradient descent method for multilayer perceptrons.[GD1] (S. Amari, personal communication, 2021.) pony in a strange land (talk) 01:36, 18 September 2024 (UTC)[reply]
Dear User:Cosmia Nebula alias "pony in a strange land," thanks for your reply! I see where you are coming from. The best reference to the mentioned priority disputes between Jürgen Schmidhuber, Geoffrey Hinton, Yoshua Bengio, and Yann LeCun (JS,GH,YB,YL) is the very explicit 2023 report[1] which to my knowledge has not been challenged. The most comprehensive surveys of the field are those published by JS in 2015[2] and 2022,[3] with over 1000 references in total; wouldn't you agree? They really credit the deep learning pioneers, unlike the surveys of GH/YB/YL.[4][5] I'd say that JS has become a bit like the chief historian of the field, with the handicap that he is part of it (as you wrote: non-NPOV?). Anyway, without his surveys, many practitioners would not even know the following facts: Alexey Ivakhnenko had a working deep learning algorithm in 1965. Shun'ichi Amari had Deep Learning by Stochastic Gradient Descent in 1967. Kunihiko Fukushima had ReLUs in 1969, and the CNN architecture in 1979. Shun'ichi Amari had Hopfield networks 10 years before Hopfield, plus a sequence-learning generalization (the "dynamic RNN" as opposed to the "equilibrium RNN" you mentioned), all using the must-cite Ising architecture (1925). Alan Turing had early unpublished work (1948) with "ideas related to artificial evolution and learning RNNs."[3] Seppo Linnainmaa had backpropagation (reverse mode of auto-diff) in 1970. G.M. Ostrovski republished this in 1971. Henry J. Kelley already had a precursor in 1960. Tow centuries ago, Gauss and Legendre had the method of least squares which is exactly what's now called a linear neural network (only the name has changed). If JS is non-NPOV (as you write), then how non-NPOV are GH/YB/YL who do not cite any of this? You blasted JS' quote, "one of the most important documents in the history of machine learning," which actually refers to the 1991 diploma thesis of his student Sepp Hochreiter, who introduced residual connections or "constant error flow," the "roots of LSTM / Highway Nets / ResNets."[3] Anyway, thanks for toning that down. You deleted important references to JS' 1991 work on self-supervised pre-training, neural network distillation, GANs, and unnormalized linear Transformers; I tried to undo this on 16 Sept 2024. Regardless of the plagiarism disputes, one cannot deny that this work predates GH/YB and colleagues by a long way. In the interest of historical accuracy, I still propose to revert the revert of my 10 edits, and continue from there. In the future, we could strive to explicitly mention details of the priority disputes between these important people, trying to represent all sides in an NPOV way. I bet you could contribute a lot here. What do you think? Speedboys (talk) 14:21, 18 September 2024 (UTC)[reply]
The most important issue is that any citation to Schmidhuber's blog posts, essays, and "Annotated History" invariably taints a Wikipedia page with non-NPOV. Before all those details, this is the main problem with citing Schmidhuber. Citing earlier works is fine, but it is *NOT* fine to cite Schmidhuber's interpretation of these earlier works.
"Annotated History of Modern AI and Deep Learning" was cited about 63 times, while "Deep learning in neural networks: An overview" was cited over 22k times. It is clear why if you compare the two. The "Deep learning in neural networks" is a mostly neutral work (if uncommonly citation-heavy), while the "Annotated History" is extremely polemical (even beginning the essay with a giant collage of people's faces and their achievements, recalling to mind the book covers from those 17th century Pamphlet wars). It is very strange that you would combine them in one sentence and say "with over 1000 references in total" as if they have nearly the same order of magnitude in citation.
As for the "very explicit 2023 report", it is... not a report. It is the most non-NPOV thing I have seen (beginning the entire report with a damned caricature comic?) and I do not want to read it. He is not the chief historian. He is the chief propagandist. If you want better history of deep learning I would rather recommend something else, such as:
  • The quest for artificial intelligence: a history of ideas and achievements, by Nilsson, Nils J.
  • Mikel Olazaran, A Historical Sociology of Neural Network Research (PhD dissertation, Department of Sociology, University of Edinburgh, 1991); Olazaran, `A Sociological History of the Neural Network Controversy', Advances in Computers, Vol. 37 (1993), 335-425.
  • Anderson, James A., and Edward Rosenfeld, eds. Talking nets: An oral history of neural networks. MiT Press, 2000.
Calling something "unnormalized linear Transformers" is a great rhetorical trick, and I can call feedforward networks "attentionless Transformers". I am serious. People are trying to figure out if attention really is necessary (for example, "Sparse MLP for image recognition: Is self-attention really necessary?" or MLP-mixers). Does that mean feedforward networks are "attentionless Transformers"? Or can I just put Rosenblatt into the Transformer page's history section?
Ising architecture (1925) is NOT a must-cite. It is not even a neural network architecture (though you can really retroactively call an "architecture", but historians call it presentism). Physicists don't cite Newton when they write new papers. They don't even cite Schrödinger. Mathematicians don't cite Gauss-Legendre for least squares. They have a vague feeling that they did something about least squares, and that's enough. It is no serious problem. Historians will do all that detailed credit assignment later.
Ising architecture is NOT a must-cite even in the 1970s, because, as you might notice in the RNN page, there were several ways to arrive at RNN. One route goes through neuroanatomy. The very first McCulloch and Pitts 1943 paper already had RNN, Hebbian learning, and universality. They had no idea of Ising, nor did they need to, because they got the idea from neuroscientists like Lorente de No. Hopfield cited Amari, btw.
Schmidhuber is not reliable by the way. I just checked his "Deep learning in neural networks" and immediately saw an error: "Early NN architectures (McCulloch and Pitts, 1943) did not learn." In fact, it stated right here in the paper:
> We suppose that some axonal terminations cannot at first excite the succeeding neuron; but if at any time the neuron fires, and the axonal terminations are simultaneously excited, they become synapses of the ordinary kind, henceforth capable of exciting the neuron. That is Hebbian learning (6 years before Hebb's 1949 book, but... Hebbian learning was an immediately obvious idea once you have associationism with the neuron doctrine).
You can find it if you ctrl+f "learn" in the paper. A little later they showed that Hebbian learning in a feedforward network is equivalent to an RNN by unrolling that RNN in time. ("THEOREM VII. Alterable synapses can be replaced by circles.", and Figure 1.i. The dashed line is the learnable synapse)
But I am tired of battling over the historical minutae. Misunderstanding history doesn't hurt the practitioners, because ideas are cheap, and are rediscovered all the time (see: Schmidhuber's long list of grievances), so not citing earlier works is not an issue. This is tiring, and I'm signing out of the debate. A word of advice: If you must use Schmidhuber's history, go directly to the source. Do not use his interpretation. @Speedboys, you seem passionate about history. It would be good to try to actually read the primary sources, do not trust Schmidhuber's interpretation, and read some other histories than his history. Other than the references I gave above, I can also recommend this one [1]https://gwern.net/tank as a good example of a microhistory on a specific problem in neural network research. pony in a strange land (talk) 22:07, 18 September 2024 (UTC)[reply]
Dear User:Cosmia Nebula, thanks! I am always going to the source when I find something of interest in a survey. You condemn JS and recommend alternative surveys such as Nils John Nilsson (2009). Unfortunately, Nilsson is not a very good source because he writes things such as, "Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams (1985), introduced a new technique, called back propagation," without mentioning the true inventors of backpropagation. He also writes that "the physicist John J. Hopfield" invented the Hopfield network, without citing Amari who published it 10 years earlier. Neither Nilsson nor the even older surveys you mention cite Ivakhnenko who started deep learning in 1965. Isn't that a rather US-centric non-NPOV here? Most of the community learned about the true pioneers from JS' much more meticulous surveys which you critisize. See my previous message. His 2015 survey lists nearly 900 references, his 2022 update over 500, adding stuff that has become important since 2015 (this is not about citations). Could it be that you have a tiny little bit of non-NPOV of your own? Maybe we all have. But then let's find a consensus. You call "unnormalized linear Transformers" a "great rhetorical trick." Why? Unlike older networks you mention, they do have linearized attention and scale linearly. The terminology "linear Transformer" is due to Katharopoulos et al. (2020), but JS had the machinery already in 1991, as was pointed out in 2021 (see reverted edits). You also claim that early NN architectures (McCulloch and Pitts, 1943) did learn. I know the paper, and couldn't find a working learning algorithm in it. Could you? Note that Gauss and Legendre had a working learning algorithm for linear neural nets over 200 years ago, another must-cite. Anyway, I'll try to follow the recommendations on this talk page and go step by step from now on, in line with WP:CONSENSUS. Speedboys (talk) 11:58, 19 September 2024 (UTC)[reply]
The 2024-09-16 diff under discussion: The original version of the "Early Work" section has a very good and accessible overview of the field, and it wikilinks related subjects in a rather fluid way. I think your version of that section, by going deep into crediting and describing a single primary sources on each topic, just doesn't work. As noted above, doing such a fine-grained step-by-step review of primary works of the history is better for the History of neural networks subarticle.
I don't know the sources on this at all, but I just lend support to editors above for at least this section, on prose, accessibility, and accuracy in a broader conceptual sense, you should not restore your edits wholesale. (I know it's a lot of work, as writing good accessible prose is super hard, but the hardest part -- finding and understanding the source material -- you've already done and banked, so you should definitely keep up editing on this and the many related articles.) SamuelRiv (talk) 19:30, 18 September 2024 (UTC)[reply]
Speedboys, whatever else may be the case, I don't think that you should "revert the revert... and continue from there." WP:CONSENSUS is sufficiently against the content that you had added, that it should not be reverted back in the same form. Please follow the advice of other editors above, and propose specific text to add back, here in talk. --Tryptofish (talk) 18:58, 18 September 2024 (UTC)[reply]

Dear SamuelRiv and Tryptofish, thanks. I'll try to follow your recommendations and go step by step from now on, in line with WP:CONSENSUS. Speedboys (talk) 11:58, 19 September 2024 (UTC)[reply]

Dear all, please review my first proposed edit in line with WP:CONSENSUS. I propose to replace the section "Neural network winter" by the section "Deep learning breakthroughs in the 1960s and 1970s" below. Why? The US "neural network winter" (if any) did not affect Ukraine and Japan, where fundamental breakthroughs occurred in the 1960s and 1970s: Ivakhnenko (1965), Amari (1967), Fukushima (1969, 1979). The Kohonen maps (1980s) should be moved to a later section. I should point out that much of the proposed text is based on older resurrected text written by other editors. Speedboys (talk) 12:17, 19 September 2024 (UTC)[reply]

Deep learning breakthroughs in the 1960s and 1970s

[edit]

Fundamental research was conducted on ANNs in the 1960s and 1970s. The first working deep learning algorithm was the Group method of data handling, a method to train arbitrarily deep neural networks, published by Alexey Ivakhnenko and Lapa in Ukraine (1965). They regarded it as a form of polynomial regression,[6] or a generalization of Rosenblatt's perceptron.[7] A 1971 paper described a deep network with eight layers trained by this method,[8] which is based on layer by layer training through regression analysis. Superfluous hidden units are pruned using a separate validation set. Since the activation functions of the nodes are Kolmogorov-Gabor polynomials, these were also the first deep networks with multiplicative units or "gates."[3]

The first deep learning multilayer perceptron trained by stochastic gradient descent[9] was published in 1967 by Shun'ichi Amari.[10] In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned internal representations to classify non-linearily separable pattern classes.[3] Subsequent developments in hardware and hyperparameter tunings have made end-to-end stochastic gradient descent the currently dominant training technique.

In 1969, Kunihiko Fukushima introduced the ReLU (rectified linear unit) activation function.[11][12][3] The rectifier has become the most popular activation function for deep learning.[13]

Nevertheless, research stagnated in the United States following the work of Minsky and Papert (1969),[14] who emphasized that basic perceptrons were incapable of processing the exclusive-or circuit. Of course, this insight was irrelevant for the deep networks of Ivakhnenko (1965) and Amari (1967).

Deep learning architectures for convolutional neural networks (CNNs) with convolutional layers and downsampling layers began with the Neocognitron introduced by Kunihiko Fukushima in 1979, though not trained by backpropagation.[15][16]

References

  1. ^ Schmidhuber, Juergen (14 December 2023). "How 3 Turing Awardees Republished Key Methods and Ideas Whose Creators They Failed to Credit. Technical Report IDSIA-23-23". IDSIA, Switzerland. Archived from the original on 16 Dec 2023. Retrieved 19 Dec 2023.
  2. ^ Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
  3. ^ a b c d e f Schmidhuber, Jürgen (2022). "Annotated History of Modern AI and Deep Learning". arXiv:2212.11279 [cs.NE].
  4. ^ LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep Learning" (PDF). Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096.
  5. ^ Bengio, Yoshua; LeCun, Yann; Hinton, Geoffrey (2021). "Turing Lecture: Deep Learning for AI". Communications of the ACM. S2CID 3074096.
  6. ^ Ivakhnenko, A. G.; Lapa, V. G. (1967). Cybernetics and Forecasting Techniques. American Elsevier Publishing Co. ISBN 978-0-444-00020-0.
  7. ^ Ivakhnenko, A.G. (March 1970). "Heuristic self-organization in problems of engineering cybernetics". Automatica. 6 (2): 207–219. doi:10.1016/0005-1098(70)90092-0.
  8. ^ Ivakhnenko, Alexey (1971). "Polynomial theory of complex systems" (PDF). IEEE Transactions on Systems, Man, and Cybernetics. SMC-1 (4): 364–378. doi:10.1109/TSMC.1971.4308320. Archived (PDF) from the original on 2017-08-29. Retrieved 2019-11-05.
  9. ^ Robbins, H.; Monro, S. (1951). "A Stochastic Approximation Method". The Annals of Mathematical Statistics. 22 (3): 400. doi:10.1214/aoms/1177729586.
  10. ^ Amari, Shun'ichi (1967). "A theory of adaptive pattern classifier". IEEE Transactions. EC (16): 279–307.
  11. ^ Fukushima, K. (1969). "Visual feature extraction by a multilayered network of analog threshold elements". IEEE Transactions on Systems Science and Cybernetics. 5 (4): 322–333. doi:10.1109/TSSC.1969.300225.
  12. ^ Sonoda, Sho; Murata, Noboru (2017). "Neural network with unbounded activation functions is universal approximator". Applied and Computational Harmonic Analysis. 43 (2): 233–268. arXiv:1505.03654. doi:10.1016/j.acha.2015.12.005. S2CID 12149203.
  13. ^ Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (October 16, 2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE].
  14. ^ Minsky, Marvin; Papert, Seymour (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. ISBN 978-0-262-63022-1.
  15. ^ Fukushima, K. (1979). "Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron". Trans. IECE (in Japanese). J62-A (10): 658–665. doi:10.1007/bf00344251. PMID 7370364. S2CID 206775608.
  16. ^ Fukushima, K. (1980). "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position". Biol. Cybern. 36 (4): 193–202. doi:10.1007/bf00344251. PMID 7370364. S2CID 206775608.