Authorial Style in the New Testament

A Cluster Analysis According to Lexical Contacts

Some time ago I shared with the B-GREEK mailing list some of the results of my stylometrical analysis of the NT canon using the concept of a lexical contact. My analysis had been directed to investigating the authorship of the disputed Pauline epistles. This message presents the result of a cluster analysis on the issue. Briefly, this analysis shows a complex relation between the Pastorals (1, 2 Tim. & Tit.) and the rest of the Pauline Corpus. Although stylistically distinct overall and in terms of vocabulary, the Pastorals are nonetheless quite close to the final chapters of Paul's letters in terms of shared phraseology. In addition, this method indicates that Colossians and Ephesians are quite consistent with Paul's style, and that the Johannine epistles are related to the Gospel of John, especially chapters 14-17. Hebrews, however, is not stylistically Pauline, nor is Revelation Johannine.

A "lexical contact" between two books or corpora is a shared word or phrase (of which each word is in lexical form). The "order" of a lexical contact is the number of words being compared at a time. Thus, "first-order lexical contacts" comprise the shared vocabulary between two corpora, and "third-order lexical contacts" are the shared three-word phrases. Although other order lexical contacts are possible, the third order is used because that order generates the most contacts.

One further concept is defined with respect to a supercorpus, in this case the NT canon. An "exclusively shared lexical contact" is a contact found in only two corpora of the supercorpus. I shall use the term "characteristic" as a short hand for this concept.

Lexical contacts, especially for phrases, tends to show a common authorship because (it is hoped) an author has certain pet expressions that recur. However, as we shall see, it cannot distinguish a work that is literary dependent on another, in which large amounts of one work have been incorporated into another. It can show, on the other hand, that two corpora are sufficiently distinct to cast doubt upon a thesis that they have a common originator of expression.

Cluster analysis is a procedure which hierarchically groups the closest two items (or clusters) at time into a larger cluster until all the items have been clusters. The closeness is measured by a distance function. For this analysis, the distance function is calculated by counting the number of contacts. For each book, the number of contacts to another is calculated and normalized to account for the length of the books. The pair with the greatest number of normalized contacts are then combined to form a corpus and put back into the analysis.

Two different normalizations were used in this analysis. The first normalized the contacts based on the number of distinct words or phrases in each corpus. The second normalized based on the total number of contacts of each corpus.

In addition to performing the cluster analysis upon each book of the New Testament, this analysis has also been performed upon each chapter, too. The results are quite interesting and will be mentioned (with a link to the appropriate page) where relevant.

The following a presentation and analysis of my results.

Case I: Characteristic Phrases (by length)

EC31 - [by chapters]

25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00  
*--------------------------------------------------------jd,p2 
 \-hb-*--*--*-----------c1-*--*--*-----pp----------------------------q1,q2
       \  \  \              \  \  \-------------------------------------co,ep
        \  \  \              \  \----------c2,pm
         \  \  \              \-----ga,rm
          \  \  \----------------------------jm,p1
           \  \------------------------------------------------ t1-t2,tt
            \--rv-ac-*-----------------------------lk-mk,mt
                      \-------------------------jn----------j1-------------j2,j3

Notes:

  1. The Johannine Epistles and the Gospel of John are clustered. In the chapter analysis, the Epistles are associated with Jn 13-15, 17.
  2. The Synoptic Gospels are clustered (hence the famous problem).
  3. Colossians and Ephesians cluster early, as well at 1, 2 Thessalonians.
  4. The Pastorals cluster early yet are further separated from the 10 Paulines than the Catholic James and 1 Peter. In the chapter analysis, they are closely associated with Rom. 16 and 1Cor. 16.
  5. 2 Peter and Jude cluster early yet remain distinct. Some of it is undoubtedly due to a literary (not authorial) dependence.
  6. Hebrews is distinct.

Case II: Common Phrases (1st normalization)

SC31 - [by chapters]

25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00  
*--*-----ac-------hb,rv
 \  \-------*--------------------lk-------------mk,mt
  \          \----------------------------jn-------------------j1----------j2,j3
   \--*--------*--------c1----c2-------------*-----pp-------------q1----pm,q2
       \        \                             \----------------------co,ep
        \        \---------jm----------ga,rm
         \-----------*--------------p1-------------------jd,p2
                      \-------------------------------t1----t2,tt

Notes.

  1. Again we see the Pastorals, Jude & 2 Peter, Col./Eph., Synoptics, and the Johannines cluster.
  2. Col./Eph. are well within the Pauline camp. Even with the intrusive entry of James, the Pastorals remain distinct from Paul, being closer the the Petrines.
  3. In the chapter analysis, the Pastorals group with the final chapters of Rom., 1Cor., 2Cor., Fal. and Philip.
  4. Also, the Johannines cluster with John 14-17.

Case III: Characteristic Vocabulary (1st norm.)

EC11

25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00  
j3-*--q2-q1-pp-ga-*--c2-p1-rm-*--*--c1-jm-hb-rv-jn-ac-lk-mk,mt
    \              \           \  \----------------------------t2----------t1,tt
     \              \           \---------------------------jd,p2
      \              \--------------------------------------------ep----co,pm
       \-------------------------------------------------------------j1,j2

Notes:

  1. The results of this run are pathological; everything pretty much clustered against the Gospels.
  2. The order of clustering is interesting: 1 Cor. and 1 Pet. are outliers.

Case IV: Common Vocabulary (1st norm.)

SC11 - [by chapters]

25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00  
*--ac----*-----lk----------mk,mt
 \        \-------------------jn,rv
  \---*-----*--------*-----------------c2-------ga----pp----------q1,q2
       \     \        \-------------c1-------------------co,ep
        \     \---------rm----------------jm,p1
         \--------hb-------------*-----------*--------------tt-------j1-pm-j2,j3
                                  \           \----------------jd,p2
                                   \---------------t1,t2

Notes:

  1. Some of the smaller books clustered early, so this Case is not useful.
  2. However note that the Pastorals are distinct from Paul, but Colossians is in the middle of Paul.
  3. The Pastorals are distinct from Paul in the chapter analysis as well.
  4. In the chapter analysis, the Johannines cluster with Jn 14-17.

Case V: Characteristic Phrases (2nd norm.)

EC32 - [by chapters]

25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00  
*--------rv-*--ac-hb-*--*--*--c1-------------ga,rm
 \           \        \  \  \----------------------c2,pm
  \           \        \  \------*--*-----------------------------------jd,p2
   lk-mk,mt    \        \         \  \----------pp-------------q1,q2
                \        \         \------------------------co,ep
                 \        \------------*--------------------------tt-t1,t2
                  \                     \----------------jm,p1
                   \----------------------jn----------j1-------------------j2,j3

Notes.

  1. The Pastoral cluster with other Catholics first (except for 2 Peter/Jude right in the middle of the Pauline 10. In the chapter analysis, they are found with 1Cor. 16 and Rom. 16.
  2. The Johannines cluster together, and with Jn 13-17, 21.

Case VI: Shared Phrases (2nd norm.)

SC32 - [by chapters]

25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00  
rv-ac-*--*--lk-mk,mt
       \  \----------jn----------------------------------j1----------------j2,j3
        \---------hb----jm-*--*--c1-*--c2-------*--------------pp,pm
                            \  \     \           \----q1,q2
                             \  \     \---------------------co,ep
                              \  \-----------ga,rm
                               \----------*--------------------------tt-t2,tt
                                           \-------p1-------------p2,jd

Notes:

  1. Johannines cluster together (with John), especially with cch. 14-17.
  2. Petrines cluster, and with the Pastorals.
  3. Col/Eph right in the middle of the Paulines, but the Pastorals just outside. However, the Pastorals cluster with the final chapters of Rom., 1,2Cor., Gal. and Philip.

Case VII: Characteristic Vocabulary (2d norm.)

EC12

25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00  
rv-ac-*--jm-hb-p1-c1-*--------rm-*-----------------q1----------ep-------co,pm
       \              \           \----ga----------------c2,q2
        \              \------------pp----*-----------t1-------------t1,tt
         \                                 \----------------------jd,p2
          \-------------*--lk----------------mk-j3,mt
                         \----------------------------------jn-------------j1,j2

Notes:

  1. Pastoral cluster early, then with Jude, then with Philippians (!).
  2. Eph/Col in the middle of Paul.
  3. Johannines (except for the short 3 John) are together.

Case VIII: Shared Vocabulary (2d norm.)

SC12 - [by chapters]

25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00  
*--pp-*--ga-*--p1-jm-*--c2-*--rm-c1-------------hb-------rv-jn-ac-lk-mk,mt
 \     \     \        \     \-------t2-------------t1,tt
  \     \     \        \---------------------co,ep
   \     \     \--------------------------------------jd,p2
    \     \----------------------------q1,q2
     \------------------------------------j1----------------------------pm-j2,j3

Notes:

  1. This is pathological, showing that 3d-order contacts are more stable.

Conclusions

  1. On a per book basis, the Pastorals do not reflect Paul's style; however, in the chapter analysis, they are closely associated with the final chapters in phraseology but not vocabulary. Since the final chapters of Paul's letters tend to be the most personal, and the Pastorals are personal, the well-known difference in style could be due to this factor. This will need more investigation.
  2. Johannine Epistles very similar to the style of John. They also consistently are close to Jn14-17.
  3. The Colossians-Ephesians corpus is Pauline as a whole.
  4. Common early clusters: 1,2 Thess; Jude/2 Pet.; Col./Eph.; Pastorals; Johannines; and Synoptics.
  5. Hebrews and Revelation are distinct. Probably not by Paul (or translated by Luke) or John, respectively.

Stephen Carlson