What is reward-based training?

A rewarding experience

Reward-based training is the use of positive reinforcement: giving a dog/ horse/ cat something they want after they complete a particular action, so that action is strengthened and repeated.

Rewards teach an animal what behaviour to do. What gets rewarded gets repeated. It’s the only part of the operant conditioning quadrant that creates a pleasurable experience – it is adding something pleasant.

For dogs, rewards can be used to train basic obedience (sit, down, stand, stay, come, heel), self-control (loose-lead walking, leave, stay), tricks, sports, and advanced behaviours needed for assistance and working dogs. It’s possible to teach a dog to do anything they are physically and mentally capable of doing, all using treats, toys, praise, fuss and environmental rewards such as running, playing and sniffing.

For horses, rewards can be used to train basic movements (walk, trot, stand, back, come), handling and management routines (health care, rugs, clipping, tacking up, leading), transport, dressage, jumping, other sports, tricks and anything you choose to try with them. All with food, scratches, social time, and access to freedom, friends and interesting places.

For cats, rewards can be used to train calmness, recall, place, tricks, grooming and encouraging appropriate play. All with food, play and companionable time spent together.

Any behavioural issue can be improved with the proper use of rewards – some issues need more time and planning than others, but it’s always possible.

Mainly it’s about teaching animals a better way of responding to a trigger, and keeping them under threshold (the point where they can still learn).

Instead of thinking ‘how can I stop them doing' something, think ‘what can I reward them for doing?’. So, if your dog is being aggressive to other dogs, reward them for walking away, focusing on you, watching calmly from a distance etc. If your horse is trying to bite you when you groom them, reward them for standing still and tolerating the sensation of being stroked all over, then with a soft brush etc. If your cat is weeing on the furniture, provide outlets for natural behaviour, reduce their stress and provide more litter trays in better locations – then you can reward them for being good!

Why focus on rewards?

Rewards increase motivation.

Rewards increase willingness.

Rewards increase understanding of the task.

Rewards create a cooperative relationship between trainer and trainee.

Rewards increase trust.

Rewards can be used to shape any behaviour an animal is physically and mentally able to complete.

Rewards allow us to help animals cope with potential stressful or painful events such as visits to the vets or groomers.

Rewards allow for better (two-way) communication.

Rewards allow for a trainee to say 'no', and for the trainer to find out why without forcing them to comply.

Rewards teach an animal what TO DO.

Rewards also teach an animal what NOT TO DO by the very fact they are being rewarded for what TO DO.

Rewards increase the likelihood of spotting difficulties or health issues, as it makes the trainer more sensitive to the behaviour, emotions and thought processes of the trainee.

Rewards mean that a trainer focuses on using positive reinforcement, and avoiding punishment and negative reinforcement.

Rewards create effective training. Just because it's kinder, it doesn't mean it's 'permissive' or 'soft' training.

Rewards allow trainees to really enjoy the experience, and to develop internal motivation for the task.

Rewards set up a trainee for success.

Rewards not bribery

It’s important that rewards in training are used correctly.

Bribery is about showing an animal something they want, and luring them towards it. A dog might be encouraged to go into a crate, or a horse into a horse box, or a cat into another room. But animals quickly learn when they have been ‘tricked’ – they haven’t chosen to go to these places, or do these things, and then they stop responding to the bribery in the future.

Waving food in front of the nose of a dog who wants to chase a squirrel or play with another dog isn’t going to work either, because that food (even high value food) can’t compete.

The difference between rewards and bribery is that rewards are given during a planned training event – and the animal chooses to complete a behaviour to earn the reward. This makes the final behaviour much stronger. It means a dog will choose to go into a crate, a horse will choose to go into a trailer, and a cat will choose to follow a cue to go into another room.

It’s not just about dishing out food though – you have to set them up for success, and you have to make sure that the behaviour you are asking for isn’t frightening or stressful for them. Shaping helps a lot with this (see below).

Desensitisation and counterconditioning

For animals with intense emotional reactions to triggers or events, they can be helped to overcome this and regain their calmness or confidence, by the careful introduction of the trigger. Either at a low enough level that they can gradually tolerate more (desensitisation), or by training them to have the opposite response to the trigger e.g. changing fear to excitement (counterconditioning). Using rewards throughout these processes speeds up the training, and makes it into a pleasurable experience.

Classical conditioning

This is the creation of a strong link between a stimulus and response. For example, the rustle of a treat bag means a dog happily expects a treat, because they’ve linked the noise to the presentation of food. This is also how punishment training works, for example the tone on an e-collar becomes linked to an unpleasant sensation (vibration, spray, shock) so they act when they hear the tone; or the word ‘no’ becomes linked to other unpleasant or unwanted consequences. This is also known as Pavlovian learning.

Operant conditioning

This is where the environment, and what happens after an animal completes an action, influences what happens in the future. There are four ‘quadrants’ of operant conditioning. Two are reinforcement, which increase a behaviour. Two are punishment, which decrease a behaviour. They can be either positive, and ADD something the animal likes or dislikes; or they can be negative, and TAKE AWAY something the animal likes or dislikes.

The four quadrants are:

positive reinforcement = add something pleasant to increase a behaviour e.g. giving food to lengthen a stay
negative reinforcement = take away something unpleasant to increase a behaviour e.g. taking away lead pressure to improve loose lead walking
positive punishment = add something unpleasant to decrease a behaviour e.g. an e-collar to stop a dog chasing livestock
negative punishment = take away something pleasant to decrease a behaviour e.g. withholding a toy until a dog stops barking

LIMA - Least Intrusive, Minimally Aversive

This means you should always be working hard to ensure your pet is set up to succeed – so that they can be rewarded for correct behaviours. If punishment and negative reinforcement have to be used, they should be at the lowest level to be effective, and should only be done as a last resort.

Good prevention and management techniques will ensure that you don’t have to reach for the aversive training tools and methods.

LIFE - The least inhibitive, functionally effective approach

This is a newer acronym, aimed at improving ethical training further still. LIMA was beginning to be used to justify the use of aversive punishment, so LIFE was created to avoid this.

It's about giving animals more choice, and limiting the ways they are inhibited to act/ behave/ live; recognising the function of behaviours being performed (why are they really doing it), plus an animal's needs; and making sure that 'success' isn't just based on whether something worked to stop a behaviour i.e. there's much more going on, and the animal's emotions, welfare and long-term wellbeing need to be considered.

NRM – No-Reward Marker

This is a word like ‘oops’, ‘wrong’ or ‘ah-ah’ that’s used to help a dog learn they’re not going to get the expected reward, and that they need to try again. Although not used by force-free trainers, it can be a useful technique to help a dog learn to avoid the wrong thing as well as teaching them to do the right thing. No-Reward Markers are usually verbal, and should not be shouted in an angry or threatening way, it’s just about giving information. They can also be body language cues – the commonest one of these is turning your back on a bitey or jumpy puppy. However, it's easy for them to be over-used, or for a dog to become anxious or frustrated. It's often better to keep quiet, and continue to focus on rewarding them for doing the right thing.

LRS – Least Reinforcing Scenario

This is where a dog is given two choices, with the most reinforcing option (i.e. the rewards) coming from the owner/ trainer in response to a particular cue. So, if a dog is staring at a squirrel on lead, the lead will be preventing them chasing the squirrel, which means it’s more reinforcing for the dog to turn to their owner and get a treat or a game of tug. This need careful planning to be successful.

The other version of the LRS is the pause in training. Rather than giving a No-Reward Marker as such, the trainer simply pauses, and tries again a few seconds later. For example, if a dog is asked to sit (when they have been trained to know what this means) but they don’t, there’s a three second pause before the cue is given again. This is repeated until the sit is offered, and then the dog can be rewarded.

Shaping aka Successive Approximations

This is the way you train a final behaviour – by working on all the tiny steps you need to get to the end goal. If you want a dog to settle in their bed, first you need to teach them to go to their bed, put all four paws on it, turn round, lie down, stay longer, approach the bed from further away, and finally do all this on a single cue ‘go to bed’.

If at any stage the trainee becomes confused, stressed or over-excited, smaller steps can be created. This is what makes reward-based training so adaptable. Any problem can be worked on using this process. Trainers who want more precision can use clickers, which accurately mark the exact moment their trainee did the right thing (a reward is then given).

Thresholds

There’s always a point when a trigger or situation become too exciting, stressful, scary, or annoying etc that an animal can no longer think clearly, and reacts in a purely emotional way. Learning, or at least the quality learning you intended, will not be possible when an animal has gone over their threshold.

This will be different for every individual, and will vary with days, weather conditions, time and countless other ways. Train the animal in front of you – stay in the present. If you’re too close to a trigger, move further away. If you’re putting the animal under too much stress, back off. Give them time. Give them safety. Give them a moment to breathe.

The benefits of rewards

Using rewards in training makes anything possible…with a bit of planning and time. It teaches animals to work out the correct behaviours, and to enjoy the process as well as the outcome. They learn how to learn, and be creative, and this is what speeds up any future training. It also helps to cement the bond with your pet, and allows you to work as a team, a partnership, rather than you needing to ‘be the boss’.

‘No’ is okay but…

Although force-free and positive-only trainers will avoid all forms of punishment and negative reinforcement, many reward-based trainers and owners will use other areas of the operant conditioning quadrants. But using the LIMA principle, and by focusing on rewards, it means everything else is kept at a minimum.

For example, a dog can be taught a ‘leave’ cue with a firm vocal cue, a bit of body language to block, and perhaps being held back on a lead (as well as being rewarded for leaving the item); or a reactive dog can be pulled back on a lead and flat collar or harness to avoid them getting close to another dog that appears suddenly (as well as being rewarded for stepping away from other dogs or focusing on their owner); or a reward can be held back until their behaviour changes for the better…these versions of ‘no’ are subtle. This is very unlike coercive trainers and some balanced trainers who are quick to use more forceful versions of no, including prong/ shock collars, slip leads, shouting, tactile corrections, strong body language pressure, repeated ‘drills’, and forced movements (sit and down or alpha roll).

Choice

Part of good communication is finding out what our pets are thinking and feeling, and to help guide them correct decisions so we can reward them. All animals (and people) can cope better with life when they can control and/ or predict what’s going to happen, and when they have a choice about the things that happen in their life. It might be about giving your pet several resting areas rather than only one bed. It might be giving them several toy options rather than only leaving the same two out every day. It might be letting a dog decide on a direction to walk in. It might be letting a horse graze between training sessions.

Choice isn’t just about compliance or noncompliance, it’s a lot more nuanced. A dog being corrected with a highly aversive technique has very little control and choice - pain or stopping of pain - especially when there hasn't been a 'no' cue first to give them chance to avoid the punishment. Whereas a dog being taught to walk away from triggers (or stand still near them) and given a reward (food, tug, praise) is choosing to do the action to get the reward, which allows more communication to happen, and active engagement in problem solving.

When a dog would rather choose to chase a squirrel or bark at a trigger etc, it's up to us to teach them what to do instead, using rewards, rather than waiting for them to fail and then using positive punishment or negative reinforcement.

Isn't punishment used at all?

It would be inaccurate to say punishment isn't used during training, because the scientific definition of punishment is that it reduces or stops behaviour, and that's the goal of a lot of training even if only rewards are used to teach alternative behaviours. Punishment (according to the four quadrants of operant conditioning) can either be adding something unpleasant, or taking away something pleasant. The aim should be to do the least aversive thing possible - in most cases this will be preventing access to a potential reward. For example, when teaching a dog not to jump up, you might hold onto a food reward until they can keep all 4 feet on the ground. Saying 'no' or other noises might count as mildly aversive, as well as it making it clear that access to a potential reward is being prevented. There are many highly aversive tools and techniques used to train animals, such as prong and shock collars, and these should be avoided (these are discussed in my book 'Ethical Pet Training').

Similar to punishment is negative reinforcement, which is the removal of something unpleasant to increase a behaviour. For example if your dog begins to pull, you wait until they step back or look at you, and then you relax the pressure on the lead and allow them to walk on again. The skill is in keeping this a very soft contact, and the timing needs to be perfect. It's important to remember that although it's a reinforcement, it isn't a reward, since something unpleasant had to happen first in order to remove it.

When a dog doesn't listen, there's a temptation to use stronger punishment or negative reinforcement, to make them comply, but this can cause many more problems and isn't really teaching them what to do (it's just telling them off for getting it wrong). It's a natural part of being an emotional human, with life throwing up all sorts of difficulties and complications, that we end up getting so cross with our dogs that we think it's okay or logical to use aversive training. But we can rise above this, and find better ways to train - it just might need a bit of work on ourselves as well as our dogs to help us move forward.

Sometimes punishment is used when managing a dog's behaviour before training has caught up with real life eg having to move them away from another dog if they're not ready to be that close; or using a crate or barrier to prevent them making mistakes even if this causes some frustration or anxiety; or keeping them on a lead in the home so they can be made to get off sofas etc if they are learning not to protect chosen areas. If an emergency situation arises, it's important to learn from it, to make sure mistakes can be prevented in the future. Management strategies and emergency protocols should be one-off events, and not part of regular training.

Training should always be focused on providing rewards to increase wanted behaviour (positive reinforcement).

The problems with aversives

Although using aversives can appear to solve a problem, they don’t address the underlying issues. This can mean that other behavioural issues develop. The things we usually find upsetting – barking, growling, kicking, biting, running away, being destructive, being protective etc – are an animal’s way to deal with troubling things, and are their way to communicate how they’re feeling. To punish one element of their life, while ignoring everything else, means they now have to suffer in silence. Aversives can also backfire and make an animal more anxious, fearful, frustrated or angry.

Using aversives are not the only way to teach a dog to stop bad behaviours, or to teach them rules and boundaries – rewards do a good job of this too. The more rewards are used, the less punishment is needed, and the punishment that does have to be used can be at a much lower level (the LIMA principle in action).

Sadly, many humans believe they have to be in charge, and to make an animal submit to them using force. This is not necessary, and is much more about the person than the animal.

The temptation of the ‘quick fix’

When people are short of time, and are angry or upset about their pet’s behaviour, they will be tempted to sort out the issues quickly. But just stopping an unwanted behaviour doesn’t address the reason it started in the first place. Animals always do things for a reason. It might be an emotional reaction to something, or a learned response, or because they’re not getting enough exercise, mental stimulation, social time or enrichment. To punish a dog for barking, growling, pulling, lunging, toileting in the wrong place, or any number of other problems, does not help the dog to live a happy life.

Address the issues, change their environment or routines, and reward them for getting it right. This might take longer in the beginning, but it will be a change that lasts a lifetime, and will help in other areas of the dog’s life. The same is true with the myriad of things a horse or cat does wrong – we need to help them, not punish them.

The last resort - are aversives the only way?

There has been a lot of debate recently about whether positive reinforcement training is ineffective with highly troubled or aggressive dogs, especially those who are languishing in shelters, effectively on death row. The argument is that only coercive/ punitive/ balanced training can ‘cure’ these dogs and allow them to be adopted. This is completely false. Any animal can be trained with reward-based training as long as the training and the environment has been carefully planned to allow the animal to succeed. This is as much about building trust as it is about shaping a behaviour. It also means the dog is not forced into a situation they cannot cope with. It means they are constantly being rewarded for getting it right, and their carers and potential new owners know how to continue the training, but will also recognise their limits.

There is no evidence to say that more dogs end up in shelters, or are euthanised because of reward-based training, in the same way there is no evidence to say they ended up there because of punitive training. Many have not received any training at all. We have a choice – we can either use rewards to teach them what to do, and build their trust and their willingness, and help them adapt…or we can resort to heavy-handed training using prong collars and shock collars, and enforcing compliance with our wishes despite their fear, confusion or anger. Some dogs, however, have had such a tough time, and are so traumatised being in a shelter environment, that they may need to be euthanised, for the safety of people or dogs they would come into contact with in the future. This is an incredibly emotive subject, but dogs live in the present, and if they are suffering, sometimes euthanasia is the kindest thing to do. There are many millions of animals worldwide who are killed every day for food, clothing, sport, entertainment and science. Their lives are no more or less important than any other animal. It’s in our power to make lives better for all animals while they are alive, and to question the choices we make in their care, training, and the way they end their lives.

(c) Sarah Crockford 2024