112022.01

Why concurrent programming is hard

My bet is on Channels. Nobody is trying to abandon mutable state or nobody should, at least. If we abandoned it, our computer would be a static screen where nothing would happen. Functional programmers embrace mutable state, but only where it's needed!

If the program is faster or cleaner using mutable state, why not? The dream is typically, as I understand it, that the compiler can figure out all of the places where mutable state is required.

Don't get me wrong, I realize most folks are more pragmatic than they are dogmatic. However, I also know that all of my coworkers and most folks I have talked to are obsessed with "persistent" collections. Often touting them as some sort of must have for reliable multithreaded work.

Remember being irrationally passionate about topics is common nature in human beings :P. See, for evidence, any of my posting history on this and related topics. The simplicity comes in particular from well-defined hierarchy of objects and locking semantics upon them. No wonder that programs with concurrency, especially written in OOP languages where mutable state hidden, spread around in really disorganized way and de-facto inter-mixed with code, is just a mess.

Sophisticated highly concurrent systems mitigate many of these problems by taking control of the scheduling order to dynamically optimize behavior. This eliminates quite a bit of coordination and agreement that you see in a lot of concurrent software. There is some interesting research on game theoretic adaptive schedulers for highly concurrent systems. Basically, if a scheduler in one process can model the behavior of schedulers in all other processes it interacts with, it can dynamically choose an operation with very low or zero probability of conflicting with the decisions made by all the other schedulers.

In principle, this leads to all the schedulers getting maximum work done with minimum coordination or conflict. Some algorithms I built for massively parallel systems work this way. At some level of concurrency, agreement and coordination becomes impractical so you have to come up techniques where the individual processes make choices that ensure they do not step on each other by making assumptions about the processes they interact with.

There are only 2 hard problems in Computer Science: naming things, concurrency, and off by one errors. And cache invalidation Maybe you removed it too early. Concurrent programming would be hard even if we had great abstractions because concurrent programming means maintaining exponentially many possible states.

If you go down that route then there are few good tools for handling or even conceiving of such problems[0]. The other route is, I think, less well described as "abstraction" and more around restriction. For instance, Erlang's model simply ensures that while you're dealing with exponentially many program states, the kinds of interaction between those states means that you basically are able to pretend like they don't exist for most problems. With it, you specify concurrent algorithms in terms of possible state transitions and assertions which should hold in all program states or subsets of those states.

A model checker then runs through all the states, checking your assertions this can understandably take quite a while, so there's a distributed model checker for large systems. A really good takeaway from the talk is a point he made on software testing: for sequential algorithms, conventional software testing is likely to find most of the bugs you really care about. Given the speed of computers today, it's only a matter of time before these bugs are hit and your system implodes in a confusing and impossible-to-reproduce way.

Software testing by itself just isn't good enough anymore. My personal gripe is the dual fight between segmenting out our domain abstraction along with our concurrency abstractions. That is, what makes your domain most understandable is not necessarily what makes it most amenable to concurrency.

Indeed, it may be directly opposed to this. This is particularly obnoxious in models where folks try to have a single Foo object that is used everywhere anything representing even part of a Foo is used. MCRed on Aug 5, prev next [—]. This sounds flippant, but it's not. Unfortunately, I think that many people think that concurrency can be done in a library, or on non-concurrent systems like the JVM. I understand fully why Go would like to claim to be concurrent, and all the other languages would like to claim it, because the reality is we're in a multi-core multi-node distributed world.

But don't fall for it. Worse, there are some simple low hanging "concurrent" fruit like go-routines that make it seem like you've solved the problem. I would have no problem if someone had made a language and VM or compiled-to-binary like go, that was truly concurrent, and that copied the erlang solution no mutable state, supervision trees, let it fail, etc.

Seriously, steal from the best if you're making a new language. So far, nobody has. And since we live in a multi-node, multi-CPU world, you really should learn erlang or elixir- which adds some features and has an easier syntax.

Seriously, I know it's much easier to go with flavor-of-the-month and pretend that there is no downside. But sooner or later-- hopefully sooner, because it will be much more painful later-- you have to pay the piper for not being concurrent. We've seen massive failures and re-architecting of systems as a result-- twitter survived years of fail whales because of their poor initial engineering. Can you count on enough VC money to do the same?

PS- not to pick on Go. Go has several features that are really bright choices. I am just using it as an example here, but I've heard of everything up to and including node. That's a fairly bold claim. Erlang is good at the actor model. It's not an all-around winner in concurrency.

Another shining example is the Glasgow Haskell Compiler runtime system. It has an exceptionally fast and efficient green thread system that in the latest version of GHC can scale almost-linearly up to 32 cores, even with some resource contention, and can easily handle millions of concurrent threads.

I suspect you could emulate Erlang actors with very little loss of efficiency. And, of course, you're also free to use a thread-based model. And being a language with type-system-encapsulated mutability and very nice resource contention management mechanisms like STM , concurrency is often "free" and very safe. I don't know seem true and practical to me. They are many other closed source applications in the industry that reliably handle concurrency using Erlang. Otherwise yes, anyone can make a thread work with thread-safe queue -- look Erlang!

Not Erlang. Yes it is. The winner in concurrency is system that has most fault tolerance tooling built in. That is true secret. Specifically, libraries work no matter how they're being executed concurrently or otherwise , and they're fault tolerant in the sense that error conditions are encapsulated in the type system and the libraries do not generally crash.

Well, not necessarily. I can express an actor however I want. Haskell forks aren't like regular C forks; they're much more flexible. It's basically the same thing as Actors in Erlang. The thread only gets picked up when something happens to a resource it's waiting on like actors in Erlang.

Well, Haskell has excellent fault tolerance support, but the recommended programming style is just to never crash, which is an entirely reasonable request when you're writing in Haskell.

There's no particular reason it would. Not a user of Haskell but interested in it. What makes my spider-sense tingle in Haskell is lazyness. It's all cool until you unexpectedly hit that lazy unroll not sure what the correct term is in a time-critical system. Are there ways to control lazyness in Haskell, just like there are ways to control mutable state? Laziness is just the default and most likely to terminate evaluation strategy. You can use a number of techniques the simplest being Bang Patterns to enforce non-lazy evaluation strategies including strict evaluation, parallel evaluation, sequential evaluation, etc.

However, there are very few activities where Haskell's laziness will cause disruptive timing variations. Any language that you might use instead of Haskell is probably heavily heap-oriented, and likely garbage collected, which means that you're already dealing with unpredictable timing variance.

Of course; careful control of evaluation strategies is one of the methods by which parallelism is accomplished. At the simpler level, parameters and items in records can be strictly evaluated. However, once you get used to it, you start to find that strict evaluation is rarely useful. I tried this code with ghc 7. What gives? The whole process dies before "thing2" gets a chance to do anything.

The idea is that if the main thread is done and not waiting, we don't really care about what any of the child threads are doing, so we just axe them. Hmm, ok. I tried adding an infinite loop after the forkIO calls: import Control. I think Haskell's concurrency must be very different from the types I'm familiar with. The parallelism or not of GHC concurrency depends on various options; if you don't link with the "-threaded" option, you are running green threads in a single native thread N:1 instead of M:N.

And Haskell threads aren't preempted except on memory allocation, so a tight loop without memory allocation never is preempted, so the main thread here never is forced out of control. It's very different from most concurrency systems. It can be easier to reason about a strictly deterministic system.

However, such a language or framework might restrict the expressiveness so much that not all conceivable applications can be expressed in it. To give a few examples, the futures of Clojure and Java are blocking, which always introduces the risk for deadlocks when other blocking abstractions are used in conjunction.

The futures offered by AmbientTalk and E called promises are inherently non-blocking to fit into the overall nature of these two languages as being non-blocking and deadlock free. Consequently however, both types of futures are used differently and one might argue that one is preferable over the other in certain situations.

Similar is the situation when it comes to the concrete implementation of communicating sequential processes. Personally, I consider the strict isolation between processes, and therefore the enforcement that any form of communication has to go via the explicit channels, as a major property that can simplify reasoning about the concurrent semantics and for instance makes sure that programs are free from low-level data races.

However, Go for instance chose to adopt the notion of communicating via channels but its goroutines are not isolated. JCSP , a Java library goes the same way. The occam-pi language on the other hand chose to stick with the notion of fully isolated processes. The same design discussion could be had for implementations of the actor model.

AmbientTalk and Erlang go with fully isolated processes, while for instance Akka makes the pragmatic decision that it cannot guarantee isolation because it is running on top of a JVM. This discussion could go on for quite a while. Wikipedia lists currently more than 60 concurrent programming languages of which most will implement some specific variation of a concept. In previous work , we identified roughly a hundred concepts that are related to concurrent programming.

It can now of course be argued that a single language will not support all of them and thus applications will perhaps only have to cope with a handful concurrent programming abstractions. However, looking at large open source applications such as IDEs, it seems that the various subsystems from time to time start to introduce their own abstractions.

They seem to implement something along the lines of STM in different degrees of complexity. And this again raises the question how are these different abstractions interacting with each other.

While most of these bugs are marked as closed, it is probably a good indication that concurrent programming is hard and error prone. Usually concurrent programming is considered hard because low-level abstractions such as threads and locks are used. While NetBeans uses these to a significant extent, it uses also considerably more high-level concepts such as futures, asynchronous tasks, and STM. Now I would argue that it is not necessarily the abstraction level but that various concurrent programming abstractions are used together while they have not been designed for that purpose.

While each abstraction in isolation is well tailored for its purpose, and thus reduced the accidental complexity, concurrency often does not remain confined to modules or subsystem and thus the interaction between the abstractions causes significant accidental complexity. As far as I am aware, the fewest languages have been designed from the ground up with concurrency in mind, and even fewer languages are designed with the interaction of concurrent programming abstractions in mind. While for instance Java was designed with threads in mind and has the synchronized keyword to facilitate thread-based programming, its memory model and the java.

Clojure was consequently designed from the start concurrent programming in mind. It started out with atoms , agents , and STM to satisfy the different use case for concurrency. However, even so Clojure was design with them from the start, they do not interact well.

Atoms are considered low-level and do not regard transactions or agents at all. STM on the other hand accounts for agents by deferring message sends until the end of a transaction to make sure that a transaction can be retried safely.

With only these three abstractions, Clojure actually could be considered a fine example. However, these abstractions were apparently not sufficient to cover all use cases equally well and futures and promises as well as CSP in form of the core.

Unfortunately, the abstractions were not designed to integrate well with the existing ones. Instead, there were merely added and interactions can cause for instance unexpected deadlocks or race conditions for more details see this paper. In order to give a more academic example, which might not be governed by mostly pragmatic concerns, Haskell might be a reasonable candidate.

Unfortunately, even in Haskell the notion of adding instead of integrating seems to be the prevalent one. I am not a Haskell expert, but the STM shows the same symptoms Clojure has, however, in a slightly different way.

The standard Control. Concurrent package comes for instance with MVar and Chan as abstractions for mutable state and communication channels. It might be performance reasons that led to this situation. However, from the perspective of engineering large applications this can hardly be ideal, because the question of whether these abstractions can be used without problems in the same application remains unanswered.

I think that concurrent programming is hard because the abstractions we use today are not prepared for the job. They are good for one specific task, but they are not easily used in conjunction with each other. Instead, interactions can lead for instance to unexpected race conditions or deadlocks. And just to support the claim that interaction is an issue, it is not just NetBeans that uses are variety of concurrent programming concepts.

Eclipse looks similar, and so do MonoDevelop and SharpDevelop. A study in the Scala world suggests also that application developer chose to combine the actor model with other abstractions for instance for performance reasons.

deoknopicprem1983's Ownd

0コメント

1000 / 1000