This is a very interesting discussion, and I've been considering and re-considering the architecture of the 2 game-type iOS applications I've been working on many times. A lot of this depends on personal preference and the way you're own brain is wired (especially if you're working alone) but from the comments I read, I do think the added benefit of having at least some form of MVC is under-estimated.
I always try to think the architecture of my projects trough before I start implementing it, because through the years I've learned that just sitting down and start hammering in code will bite you in the ass later on. Thinking longer and taking less time behind the computer has really paid off for me in many ways, previously I'd be hacking around for hours to add new features, had to refactor a lot of times, obfuscated my code beyond even my own recognition (which is a disaster and extremely off-putting if you leave stuff for a few weeks and then want to continue on it), etc. Nowadays I use to think a few hours about what I want to do, what the pro's and cons are, what kind of extensions I expect to add in the future, if there's a possibility I want to port from iPhone to iPad or OS X or vice versa, etc. Actually implementing the stuff I come up with after I have it all in my head then only takes minutes instead of hours and hours instead of days, I have virtually no bugs, don't need to refactor all the time, and it's much more effective and motivating to do it like that (that's what I think at least).
Now on the topic of MVC, here's how I do it right now: like many already suggested, I also have an M-VC architecture instead of a real MVC, ie: view and controller are merged. I pick one CCLayer in my CCScene to be the 'main' layer, and use that to 'catch' touch and accelerometer events, and have it forward these to the CCScene. In that regard, the CCScene is my controller, and the set of layers it contains are my views. It would be even better if input events could go directly to the CCScene without indirection through CCLayer, but unfortunately that's not possible with Cocos2d, as CCScene is not a touch/accelerometer delegate. There is no game state or logic whatsoever in any of these, these two classes are only for presentation (the layers) and to initiate model updates (the scene). The CCScene is also used to store shared data related to the presentation side of the application, which is used by the CCLayers, think CCTexture2D's used for sprite batch nodes etc. My CCScene adds the CCLayers as children, and the CCLayers ofcourse have a draw function to draw themselves. The CCScene also has a 'tick' function which is scheduled at regular intervals using CCDirector, which is where things get interesting.
The way the whole thing works together is like this: I have a model (world) with a number of entities (actors), and it's completely autonomous. I don't want to go into much detail about the specifics of my game (at least not yet), but the model is roughly this: a 'world' with a set of 'actors' in it that have specific 'behaviour' depending on their type (player, enemy or projectile, in my case). All of these have a 'tick' function which takes a time delta, but only the world's 'tick' function is called directly (from the controller). In this function all the actors are serviced by calling their 'tick' function, and these actor tick functions are where all actor-specific game logic is implemented. To make matters a little more complex, I also have 'aiTick' functions which work in a similar fashion but instead of updating actor positions etc, they are only for AI tasks, the only reason I have them is because I want the AI to run at a different framerate (to save CPU cycles). The AI tick functions 'throw a dice' and depending on the outcome, the actors do something (like changing direction, shooting, etc), while the 'normal' tick functions simply update their position according to the current direction and velocity.
Summarizing the model/model update part it works like this: just start a clock, have it call back the model 'tick' and 'aiTick' functions, and the game is 'alive': the actors start doing their thing and moving around etc, even without displaying anything. Add to that a set of functions the controller can call on touch/accelerometer input, such as (for example) [[world player] setDirection] or [[[world actors] anyObject] shoot], and that should give a good impression of the way the game logic is implemented. The world, all actors, all projectiles etc. are completely unaware of their presentation, and the code is portable to OS X or between iPhone/iPad unmodified.
Now for the view part, I use the following approach: inside my CCLayer's I keep a set of sprites for all visible game elements drawn by that layer, which are represented by subclasses of CCSprite, and get re-used where possible. For example, I have an 'ActorLayer', a 'ProjectileLayer' and a 'WorldLayer', all of which have different ways of caching/reusing sprites depending on their drawing/performance characteristics (ie: do the sprites 'live long', how many of them do I expect, how often do they animate, etc). On every 'draw' call, the layer gets the data set it displays from the model, loops through the elements, looks them up in a dictionary of associated sprites, then updates the sprite to reflect the current element state. For example for projectiles, I get the set of all 'active' projectiles from the model, loop through it, lookup the ProjectileSprites associated with each projectile (which has a reference to the projectile itself), and call their 'update' method, which will check if any attributes changed and update the CCSprite. Last but not least while looping I track all updated sprites in a set, and afterwards I dispose the ones that weren't updated (which indicates the actor/projectile they were linked to apparently disappeared from the model and the sprite can be disposed or re-used).
The CCSprite subclasses don't contain any logic, just visual attributes of what they are displaying. They initialize the CCSprite to its initial state (set the initial display frame, color, anchor point, etc), and the only way to change any of these afterwards is through their 'update' method. For example I have projectiles that change their display frame while flying around the screen, which is implemented in the 'update' method by getting a simulation timestamp from the model, and using that to calculate what the display frame should be at that point in time'. Summarizing the view part of this architecture it works like this: on every draw call: get model state, lookup sprites, update sprites, dispose stale sprites. No more, no less.
What does this all mean in terms of code, complexity and architecture? Personally, I think this is about as clean a solution as you can seperate your data from its presentation, and the code shows this. All of the classes in the VC part of the M-VC architecture are extremely small and concise, 100 lines of code per sprite or layer at maximum. Only the WorldLayer is a little bigger because it uses a pretty intelligent sprite caching/pooling mechanism to make it efficient. The division of responsibilities is very strict, anything in the V part relies fully on the state of the M part and it only has read-access to it, the C part is only there to store shared V data and to handle input events, and the M part is fully decoupled from both the V and the C part, it could run headless if that would make sense. All of the difficult and intelligent parts of the game are in the model, which means that's where all the large and complex parts of the code are concentrated, but the nice thing is that they are 100% portable, in theory I could drop it into an OS X or iPad port without any modification and I'd only have to re-implement the simple and small classes in the VC parts. Personally I think this is a huge advantage.
I've tried a few other architectures before, when I was still learning Objective-C, which I found to be remarkably different from C++ or Java (which I already had a lot of experience with) in terms of what works well and what doesn't. I've tried using a flat architecture (no separation between data, logic and presentation), I've tried using a MV-C design (one class dedicated to touch handling, and a flat design for everything else), and now I've converged to this M-VC setup that works really well for me. If you do it this way, subclassing CCSprite is a great way to limit the number of classes and relations you have to administer compared to compositing model and presentation data into the same class. My sprites can have a different lifetime than the objects they display (which allows nice dying animations, for example), it's much easier to recycle and re-use sprites for performance, I can replace the complete presentation layer to include nicer graphics and effects without having to touch any of the game logic, etc.
So although it took a little more time thinking this through, I'm really happy that I invested these hours, because they are now saving me bucketloads of time adding new features, new actor behaviours, new ways of displaying them etc. If I want to add or change some aspect of the code, the changes are usually isolated from everything else, and I don't get distracted by all kinds of unrelated code scattered around the same files. I'm going to be even happier I did it like this when my game is in the iOS app store and I want to create an OS X App Store port of it after that, it should only take a tiny fraction of the time it took me to code the iOS version.