In cognitive psychology, chunking is a process by which individual pieces of an information set are broken down and then grouped together in a meaningful whole. The chunks by which the information is grouped are meant to improve short-term retention of the material, thus bypassing the limited capacity of working memory and allowing the working memory to be more efficient.  A chunk is a collection of basic units that have been grouped together and stored in a person's memory. These chunks can be retrieved easily due to their coherent grouping. It is believed that individuals create higher order cognitive representations of the items within the chunk. The items are more easily remembered as a group than as the individual items themselves. These chunks can be highly subjective because they rely on an individual's perceptions and past experiences, which are linked to the information set. The size of the chunks generally ranges from two to six items, but often differs based on language and culture.
According to Johnson (1970), there are four main concepts associated with the memory process of chunking: chunk, memory code, decode, and recode. The chunk, as mentioned prior, is a sequence of to-be-remembered information that can be composed of adjacent terms. These item or information sets are to be stored in the same memory code. The process of recoding is where one learns the code for a chunk, and decoding is when the code is translated into the information that it represents.
The phenomenon of chunking as a memory mechanism is easily observed in the way individuals group numbers, and information, in day-to-day life. For example, when recalling a number such as 12101946, if numbers are grouped as 12, 10 and 1946, a mnemonic is created for this number as a month, day, and year. It would be stored as December 10, 1946 instead of a string of numbers. Similarly, another illustration of the limited capacity of working memory as suggested by George Miller can be seen from the following example: While recalling a mobile phone number such as 9849523450, we might break this into 98 495 234 50. Thus, instead of remembering 10 separate digits that are beyond the putative "seven plus-or-minus two" memory span, we are remembering four groups of numbers. An entire chunk can also be remembered simply by storing the beginnings of a chunk in the working memory, resulting in the long term memory recovering the remainder of the chunk. 
A modality effect is present in chunking. That is, the mechanism used to convey the list of items to the individual affects how much "chunking" occurs.
Experimentally, it has been found that auditory presentation results in a larger amount of grouping in the responses of individuals than visual presentation does. Previous literature, such as George Miller's has shown that the probability of recall of information is greater when the "chunking" strategy is used. As stated above, the grouping of the responses occurs as individuals place them into categories according to their inter-relatedness based on semantic and perceptual properties. Lindley (1966) showed that since the groups produced have meaning to the participant, this strategy makes it easier for an individual to recall and maintain information in memory during studies and testing. Therefore, when "chunking" is used as a strategy, one can expect a higher proportion of correct recalls.
Various kinds of memory training systems and mnemonics include training and drills in specially-designed recoding or chunking schemes. Such systems existed before Miller's paper, but there was no convenient term to describe the general strategy, and no substantive and reliable research. The term "chunking" is now often used in reference to these systems. As an illustration, patients with Alzheimer's disease typically experience working memory deficits; chunking is an effective method to improve patients' verbal working memory performance. Chunking has been proven to decrease the load on the working memory in many ways. As well as remembering chunked information easier, a person can also recall other non-chunked memories easier due to the benefits chunking has on the working memory. 
The word chunking comes from a famous 1956 paper by George A. Miller, "". At a time when information theory was beginning to be applied in psychology, Miller observed that some human cognitive tasks fit the model of a "channel capacity" characterized by a roughly constant capacity in bits, but short-term memory did not. A variety of studies could be summarized by saying that short-term memory had a capacity of about "seven plus-or-minus two" chunks. Miller (1956) wrote, "With binary items the span is about nine and, although it drops to about five with monosyllabic English words, the difference is far less than the hypothesis of constant information would require (see also, memory span). The span of immediate memory seems to be almost independent of the number of bits per chunk, at least over the range that has been examined to date." Miller acknowledged that "we are not very definite about what constitutes a chunk of information."
Miller (1956) noted that according to this theory, it should be possible to increase short-term memory for low-information-content items effectively by mentally recoding them into a smaller number of high-information-content items. He imagined this process being useful in scenarios such as "a man just beginning to learn radio-telegraphic code hears each dit and dah as a separate chunk. Soon he is able to organize these sounds into letters and then he can deal with the letters as chunks. Then the letters organize themselves as words, which are still larger chunks, and he begins to hear whole phrases." Thus, a telegrapher can effectively "remember" several dozen dits and dahs as a single phrase. Naïve subjects can remember a maximum of only nine binary items, but Miller reports a 1954 experiment in which people were trained to listen to a string of binary digits and (in one case) mentally group them into groups of five, recode each group into a name (for example, "twenty-one" for 10101), and remember the names. With sufficient practice, people found it possible to remember as many as forty binary digits. Miller wrote:
It is a little dramatic to watch a person get 40 binary digits in a row and then repeat them back without error. However, if you think of this merely as a mnemonic trick for extending the memory span, you will miss the more important point that is implicit in nearly all such mnemonic devices. The point is that recoding is an extremely powerful weapon for increasing the amount of information that we can deal with.
Studies have shown that people have better memories when we are trying to remember items with which we are familiar. Similarly, people tend to create familiar chunks. This familiarity allows one to remember more individual pieces of content, and also more chunks as a whole. One well-known chunking study was conducted by Chase and Ericsson, who worked with an undergraduate student, SF, over two years. They wanted to see if a person's digit span memory could be improved with practice. SF began the experiment with a normal span of 7 digits. SF was a long-distance runner, and chunking strings of digits into race times increased his digit span. By the end of the experiment his digit span had grown to 80 numbers. A later description of the research in The Brain-Targeted Teaching Model for 21st Century Schools (2012) states that SF later expanded his strategy by incorporating ages and years, but his chunks were always familiar, which allowed him to recall them more easily. It is important to note that a person who does not have knowledge in the expert domain (e.g. being familiar with mile/marathon times) would have difficulty chunking with race times and ultimately be unable to memorize as many numbers using this method.
Chunking as a method of learning can be applied in a number of contexts, and is not limited to learning verbal material. Karl Lashley, in his classic paper on serial order, argued that the sequential responses that appear to be organized in a linear and flat fashion concealed an underlying hierarchical structure. This was then demonstrated in motor control by Rosenbaum et al. (1983). Thus sequences can consist of sub-sequences and these can in turn consist of sub-sub-sequences. Hierarchical representations of sequences have an advantage over linear representations: They combine efficient local action at low hierarchical levels while maintaining the guidance of an overall structure. While the representation of a linear sequence is simple from a storage point of view, there can be potential problems during retrieval. For instance, if there is a break in the sequence chain, subsequent elements will become inaccessible. On the other hand, a hierarchical representation would have multiple levels of representation. A break in the link between lower level nodes does not render any part of the sequence inaccessible, since the control nodes (chunk nodes) at the higher level would still be able to facilitate access to the lower level nodes.
Chunks in motor learning are identified by pauses between successive actions in Terrace (2001). It is also suggested that during the sequence performance stage (after learning), participants download list items as chunks during pauses. He also argued for an operational definition of chunks suggesting a distinction between the notions of input and output chunks from the ideas of short-term and long-term memory. Input chunks reflect the limitation of working memory during the encoding of new information (how new information is stored in long-term memory), and how it is retrieved during subsequent recall. Output chunks reflect the organization of over-learned motor programs that are generated on-line in working memory. Sakai et al. (2003) showed that participants spontaneously organize a sequence into a number of chunks across few sets, and that these chunks were distinct among participants tested on the same sequence. They also demonstrated that performance of a shuffled sequence was poorer when the chunk patterns were disrupted than when the chunk patterns were preserved. Chunking patterns also seem to depend on the effectors used.
Previous research shows that the mechanism of chunking is available in seven-month-old infants. This means that chunking can occur even before the working memory capacity has completely developed. Knowing that the working memory has very limited capacity, it can be beneficial to utilize chunking. In infants, whose working memory capacity is not completely developed, it can be even more helpful to chunk memories. These studies were done using the violation-of-expectation method and recording the amount of time the infants watched the objects in front of them. Although the experiment showed that infants can use chunking, researchers also concluded that an infant's ability to chunk memories will continue to develop over the next year of their lives.
This usage derives from Miller's (1956) idea of chunking as grouping, but the emphasis is now on long-term memory rather than only on short-term memory. A chunk can then be defined as "a collection of elements having strong associations with one another, but weak associations with elements within other chunks". Chase and Simon (1973) and later Gobet, Retschitzki and de Voogt (2004) showed that chunking could explain several phenomena linked to expertise in chess. Following a brief exposure to pieces on a chess board, skilled chess players were able to encode and recall much larger chunks than novice chess players. However, this effect is mediated by specific knowledge of the rules of chess; when pieces were distributed randomly (including scenarios that were not common or allowed in real games), the difference in chunk size between skilled and novice chess players was significantly reduced. Several successful computational models of learning and expertise have been developed using this idea, such as EPAM (Elementary Perceiver and Memorizer) and CHREST (Chunk Hierarchy and Retrieval Structures). Chunking has also been used with models of language acquisition.