"Proving the Obvious" -- A Basic Stylometric Study

Discussion about the New Testament, apocrypha, gnostics, church fathers, Christian origins, historical Jesus or otherwise, etc.
Post Reply
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

"Proving the Obvious" -- A Basic Stylometric Study

Post by Peter Kirby »

To many, the result of decades of work sifting through the patristic authors, using conventional methods, ends up with the "obvious" facts of authorship.

To some, this may not be enough; they may be interested in knowing whether non-conventional methods can confirm these results. To others, this may be enough, but they may be skeptical of non-conventional methods of authorship attribution. If you fall into either of these two groups, you may be interested in this study about "proving the obvious" in patristic authorship attribution.

There are (at least) two different ways that we can "prove the obvious." For a single work, we can extract a "Sample" from it, designate it the test sample, and put the rest of the work among the candidate authors. If that is still identified as the likely author, this can be said to help confirm the unity of the work (or to help confirm the validity of the stylometric test, depending on your point of view). Or, for different works, we can try to see whether a sample from the one can be identified with the author of another.

The other thread has a study where we "proved the obvious" with respect to Justin Martyr: we concluded that the most likely author of the "First Apology" and of the "Second Apology" was the author of the "Dialogue with Trypho." Let's do some more exercises like that.

(1) "Proving the Obvious" -- Athenagoras of Athens

Athenagoras has two works attributed to him: "A Plea for the Christians" and "On the Resurrection of the Dead." We'd like to know if they had the same author.

Author Group
#1 Justin #2 Tatian #3 Athenagoras #4 Irenaeus #5 Clement Alex. #6 Origen #7 Theophilus (Plea)

Control Group
#1 Josephus #2 Acts #3 Mark #4 John #5 1Cor #6 Hebrews #7 Revelation
#8 Life of Adam and Eve #9 1 Maccabees #10 2 Maccabees
#11 Polybius #12 Diodorus Siculus #13 Dionysius Halicarnassus #14 Strabo #15 Plutarch #16 Arrian
#17 Herodian #18 Herodotus #19 Thucydides #20 Xenophon #21 Epictetus #22 Galen
#23 Lucian #24 Philostratus #25 Basil #26 John Chrysostom

This test is with chapters 1 to 13 of "On the Resurrection of the Dead."
testsize: 4549
$VAR1 = 262; $VAR2 = 249; $VAR3 = 110; $VAR4 = 265; $VAR5 = 36; $VAR6 = 91; $VAR7 = 44; $VAR8 = 26; $VAR9 = 47; $VAR10 = 37; $VAR11 = 28; $VAR12 = 32; $VAR13 = 21; $VAR14 = 26; $VAR15 = 10; $VAR16 = 15; $VAR17 = 9; $VAR18 = 39; $VAR19 = 52; $VAR20 = 26;

20 Words
$VAR1 = [ 'O', 'OI', 'H', 'AI', 'TO', 'TA' ]; $VAR2 = [ 'TOU', 'TWN', 'THS' ]; $VAR3 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ]; $VAR4 = [ 'KAI' ]; $VAR5 = [ 'TE' ]; $VAR6 = [ 'DE', 'D' ]; $VAR7 = [ 'MEN' ]; $VAR8 = [ 'ALLA', 'ALL' ]; $VAR9 = [ 'GAR' ]; $VAR10 = [ 'EIS' ]; $VAR11 = [ 'EN' ]; $VAR12 = [ 'DIA', 'DI' ]; $VAR13 = [ 'PARA', 'PAR' ]; $VAR14 = [ 'MH' ]; $VAR15 = [ 'OTI' ]; $VAR16 = [ 'UPO', 'UP' ]; $VAR17 = [ 'META', 'MET', 'MEQ' ]; $VAR18 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR19 = [ 'OU', 'OUK', 'OUX' ]; $VAR20 = [ 'PERI' ];

Author Chi-Square-Based P-Values
$VAR1 = '6.78208518925442e-13'; $VAR2 = 0; $VAR3 = 0; $VAR4 = '6.2740714914153e-09'; $VAR5 = '8.00857141630174e-19'; $VAR6 = '5.00248383009696e-10'; $VAR7 = '1.22722438940479e-05';

Control Chi-Square-Based P-Values
$VAR1 = '1.99011125575588e-12'; $VAR2 = 0; $VAR3 = 0; $VAR4 = 0; $VAR5 = 0; $VAR6 = 0; $VAR7 = 0; $VAR8 = '3.86877337400529e-11'; $VAR9 = 0; $VAR10 = 0; $VAR11 = '6.31965276778601e-55'; $VAR12 = 0; $VAR13 = '2.5780686660294e-36'; $VAR14 = '1.60883063456591e-08'; $VAR15 = 0; $VAR16 = 0; $VAR17 = '8.25828431355798e-44'; $VAR18 = 0; $VAR19 = '9.69238160174912e-27'; $VAR20 = '3.11688969336939e-25'; $VAR21 = 0; $VAR22 = '1.43699318835081e-48'; $VAR23 = '1.43475221102759e-25'; $VAR24 = '7.0774094703045e-55'; $VAR25 = '1.46224471191943e-14'; $VAR26 = '1.24758427368846e-12';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Chi-Square-Based Method
$VAR1 = '5.52331195906894e-08'; $VAR2 = '0'; $VAR3 = '0'; $VAR4 = '0.000510958696825175'; $VAR5 = '6.52215904760404e-14'; $VAR6 = '4.07400939280461e-05'; $VAR7 = '0.999448245976062';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Chi-Square-Based Method
$VAR1 = 7; $VAR2 = '0.998690765672404'; $VAR3 = 14; $VAR4 = '0.00130923432759604';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>1.2e-05 Test, Chi-Square-Based Method
1
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>1.2e-05 Test, Chi-Square-Based Method
0.13013698630137
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Chi-Square-Based Method
0.884848484848485

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>1.2e-05 Test, Chi-Square-Based Method
0.666666666666667
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Chi-Square-Based Method
0.6

Percentage of Samples in the Best Control Candidate that Meet the P-Value>1.2e-05 Test, Chi-Square-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Chi-Square-Based Method
1

Author Z-Score-Based P-Values
$VAR1 = '0.0389287296821748'; $VAR2 = '1.8655703642448e-07'; $VAR3 = '0.000152534988677256'; $VAR4 = '0.035673234051694'; $VAR5 = '0.0221786718454341'; $VAR6 = '0.0703532842848609'; $VAR7 = '0.0944429198795122';

Control Z-Score-Based P-Values
$VAR1 = '0.0214937707902393'; $VAR2 = '8.54189703175516e-10'; $VAR3 = '8.73529752573435e-191'; $VAR4 = '7.34026268611622e-19'; $VAR5 = '0'; $VAR6 = '0'; $VAR7 = '1.18954573120418e-88'; $VAR8 = '0.0738675435889026'; $VAR9 = '1.08423044904729e-42'; $VAR10 = '7.6244671299611e-10'; $VAR11 = '0.000387919061932858'; $VAR12 = '0.000166037074752847'; $VAR13 = '0.000573058786492357'; $VAR14 = '0.0707088249480558'; $VAR15 = '5.80942813963615e-39'; $VAR16 = '3.34111644599952e-06'; $VAR17 = '4.14163755031612e-11'; $VAR18 = '7.20945240268543e-07'; $VAR19 = '6.45906442601326e-06'; $VAR20 = '0.00501914757346439'; $VAR21 = '4.42760970269781e-17'; $VAR22 = '0.00810365288644468'; $VAR23 = '0.0138349455340384'; $VAR24 = '4.28365673401927e-08'; $VAR25 = '0.0358336853377963'; $VAR26 = '0.0132999336439379';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.148736464808925'; $VAR2 = '7.12785500825437e-07'; $VAR3 = '0.000582796180629365'; $VAR4 = '0.136298069946523'; $VAR5 = '0.0847388874843662'; $VAR6 = '0.268801445042245'; $VAR7 = '0.36084162375181';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 7; $VAR2 = '0.561123283325967'; $VAR3 = 8; $VAR4 = '0.438876716674033';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.09 Test, Z-Score-Based Method
1
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.09 Test, Z-Score-Based Method
0.0273972602739726
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
0.973333333333333

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.09 Test, Z-Score-Based Method
0.4
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
0.714285714285714

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.09 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
1
This test is with chapters 14 to 25 of the same work.
testsize: 4346
$VAR1 = 239; $VAR2 = 299; $VAR3 = 92; $VAR4 = 222; $VAR5 = 27; $VAR6 = 113; $VAR7 = 37; $VAR8 = 19; $VAR9 = 48; $VAR10 = 10; $VAR11 = 24; $VAR12 = 26; $VAR13 = 5; $VAR14 = 24; $VAR15 = 9; $VAR16 = 3; $VAR17 = 8; $VAR18 = 36; $VAR19 = 63; $VAR20 = 19;

20 Words
$VAR1 = [ 'O', 'OI', 'H', 'AI', 'TO', 'TA' ]; $VAR2 = [ 'TOU', 'TWN', 'THS' ]; $VAR3 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ]; $VAR4 = [ 'KAI' ]; $VAR5 = [ 'TE' ]; $VAR6 = [ 'DE', 'D' ]; $VAR7 = [ 'MEN' ]; $VAR8 = [ 'ALLA', 'ALL' ]; $VAR9 = [ 'GAR' ]; $VAR10 = [ 'EIS' ]; $VAR11 = [ 'EN' ]; $VAR12 = [ 'DIA', 'DI' ]; $VAR13 = [ 'PARA', 'PAR' ]; $VAR14 = [ 'MH' ]; $VAR15 = [ 'OTI' ]; $VAR16 = [ 'UPO', 'UP' ]; $VAR17 = [ 'META', 'MET', 'MEQ' ]; $VAR18 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR19 = [ 'OU', 'OUK', 'OUX' ]; $VAR20 = [ 'PERI' ];

Author Chi-Square-Based P-Values
$VAR1 = '4.22830039720807e-09'; $VAR2 = 0; $VAR3 = 0; $VAR4 = '1.33816564502871e-13'; $VAR5 = '4.01064292919044e-13'; $VAR6 = '7.53589896904617e-07'; $VAR7 = '0.0672041237835196';

Control Chi-Square-Based P-Values
$VAR1 = '4.06193934577003e-13'; $VAR2 = 0; $VAR3 = 0; $VAR4 = 0; $VAR5 = 0; $VAR6 = 0; $VAR7 = 0; $VAR8 = '7.937563141249e-11'; $VAR9 = 0; $VAR10 = 0; $VAR11 = '1.85332880977305e-24'; $VAR12 = '4.94084904344495e-05'; $VAR13 = '4.57332191655007e-27'; $VAR14 = '4.05838859959686e-05'; $VAR15 = 0; $VAR16 = 0; $VAR17 = 0; $VAR18 = 0; $VAR19 = '5.26570948465044e-40'; $VAR20 = '3.00854848600267e-35'; $VAR21 = 0; $VAR22 = '2.53933242760227e-12'; $VAR23 = 0; $VAR24 = 0; $VAR25 = '1.63500582106775e-08'; $VAR26 = '2.81098354485038e-06';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Chi-Square-Based Method
$VAR1 = '6.29165664219605e-08'; $VAR2 = '0'; $VAR3 = '0'; $VAR4 = '1.99117327956704e-12'; $VAR5 = '5.96778512746618e-12'; $VAR6 = '1.12133208025675e-05'; $VAR7 = '0.999988723754672';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Chi-Square-Based Method
$VAR1 = 7; $VAR2 = '0.999265339845152'; $VAR3 = 12; $VAR4 = '0.000734660154847873';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.06 Test, Chi-Square-Based Method
0.75
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.06 Test, Chi-Square-Based Method
0.0196078431372549
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Chi-Square-Based Method
0.97452229299363


Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.06 Test, Chi-Square-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Chi-Square-Based Method
1


Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.06 Test, Chi-Square-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Chi-Square-Based Method
1


Author Z-Score-Based P-Values
$VAR1 = '0.0159168379659639'; $VAR2 = '3.09950295206002e-64'; $VAR3 = '2.18574082592966e-53'; $VAR4 = '0.00447413456724124'; $VAR5 = '0.0376872781408746'; $VAR6 = '0.0218153509881043'; $VAR7 = '0.0391998767696662';

Control Z-Score-Based P-Values
$VAR1 = '0.0162302425707405'; $VAR2 = '5.53230945341289e-08'; $VAR3 = '2.86076339963524e-90'; $VAR4 = '1.0147531289995e-13'; $VAR5 = '0'; $VAR6 = '0'; $VAR7 = '1.48444938940122e-74'; $VAR8 = '0.0417834216921657'; $VAR9 = '3.46749737698547e-33'; $VAR10 = '1.54813955547156e-16'; $VAR11 = '0.00181227145321975'; $VAR12 = '0.0209517477291002'; $VAR13 = '0.000413763912412668'; $VAR14 = '0.0291759586222472'; $VAR15 = '8.82632743916466e-94'; $VAR16 = '1.32935276991962e-05'; $VAR17 = '8.84001579464579e-05'; $VAR18 = '0.000161715409380325'; $VAR19 = '0.00499007581385892'; $VAR20 = '0.0014423205156063'; $VAR21 = '9.25170217596236e-10'; $VAR22 = '0.00909302549363535'; $VAR23 = '0.00521476799533951'; $VAR24 = '2.13968435997231e-05'; $VAR25 = '0.0433661923019287'; $VAR26 = '0.027857899453689';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.133649954435348'; $VAR2 = '2.60257991694623e-63'; $VAR3 = '1.83531529577451e-52'; $VAR4 = '0.0375682583643865'; $VAR5 = '0.316451233410238'; $VAR6 = '0.183178384537554'; $VAR7 = '0.329152169252473';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 7;$VAR2 = '0.474769808111793'; $VAR3 = 25; $VAR4 = '0.525230191888207';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.03 Test, Z-Score-Based Method
1
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.03 Test, Z-Score-Based Method
0.196078431372549
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
0.836065573770492

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.03 Test, Z-Score-Based Method
0.625
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
0.615384615384615

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.03 Test, Z-Score-Based Method
0.714285714285714
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
0.583333333333333
Both samples seem most likely to be by the author of the Plea for the Christians.

(2) "Proving the Obvious" -- Irenaeus of Lyons

Extant in Greek, there are parts of the Adv. Haer. (in five books) as well as other fragments. We'd like to know if they have the same author.

The first test compares some extant parts of books 2 and 3 of the Adv. Haer. in Greek to a long section of book 1 extant in Greek.

The author candidates and the controls are the same as in the previous test above, for Athenagoras.
testsize: 2085
$VAR1 = 123; $VAR2 = 123; $VAR3 = 54; $VAR4 = 79; $VAR5 = 156; $VAR6 = 2; $VAR7 = 12; $VAR8 = 16; $VAR9 = 20; $VAR10 = 38; $VAR11 = 13; $VAR12 = 9; $VAR13 = 15; $VAR14 = 4; $VAR15 = 13; $VAR16 = 8; $VAR17 = 15; $VAR18 = 6; $VAR19 = 14; $VAR20 = 4; $VAR21 = 2; $VAR22 = 6; $VAR23 = 44;

23 Words
$VAR1 = [ 'O', 'OI', 'H', 'AI', 'TO', 'TA' ]; $VAR2 = [ 'TOU', 'TWN', 'THS' ]; $VAR3 = [ 'TWi', 'TOIS', 'THi', 'TAIS' ]; $VAR4 = [ 'TON', 'TOUS', 'THN', 'TAS' ]; $VAR5 = [ 'KAI' ]; $VAR6 = [ 'TE' ]; $VAR7 = [ 'ALLA', 'ALL' ]; $VAR8 = [ 'GAR' ]; $VAR9 = [ 'EIS' ]; $VAR10 = [ 'EN' ]; $VAR11 = [ 'DIA', 'DI' ]; $VAR12 = [ 'EK', 'EC' ]; $VAR13 = [ 'KATA', 'KAT', 'KAQ' ]; $VAR14 = [ 'PROS' ]; $VAR15 = [ 'APO', 'AP' ]; $VAR16 = [ 'META', 'MET', 'MEQ' ]; $VAR17 = [ 'OU', 'OUK', 'OUX' ]; $VAR18 = [ 'POLUS', 'POLLOU', 'POLLWi', 'POLUN', 'POLLH', 'POLLHS', 'POLLHi', 'POLLHN', 'POLU', 'POLLOU', 'POLLWi', 'POLU', 'POLLOI', 'POLLWN', 'POLLOIS', 'POLLOUS', 'POLLAI', 'POLLWN', 'POLLAIS', 'POLLAS', 'POLLA', 'POLLWN', 'POLLOIS', 'POLLA' ]; $VAR19 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR20 = [ 'ODE', 'TOUDE', 'TWiDE', 'TONDE', 'OIDE', 'TWNDE', 'TOISDE', 'TOUSDE', 'HDE', 'THSDE', 'THiDE', 'THNDE', 'AIDE', 'TWNDE', 'TAISDE', 'TASDE', 'TODE', 'TOUDE', 'TODE', 'TADE', 'EKEINOS', 'EKEINOU', 'EKEINWi', 'EKEINON', 'EKEINOI', 'EKEINWN', 'EKEINOIS', 'EKEINOUS', 'EKEINH', 'EKEINHS', 'EKEINHi', 'EKEINHN', 'EKEINAI', 'EKEINAIS', 'EKEINAS', 'EKEINO', 'EKEINOU', 'EKEINWi', 'EKEINO', 'EKEINA', 'EKEINWN', 'EKEINOIS', 'EKEINA' ]; $VAR21 = [ 'PERI' ]; $VAR22 = [ 'TIS', 'TINOS', 'TINI', 'TINA', 'TINES', 'TINWN', 'TISI', 'TISIN', 'TINAS', 'TI', 'TINA' ]; $VAR23 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ];

Author Chi-Square-Based P-Values
$VAR1 = '0.169008340032144'; $VAR2 = '2.45390097771078e-25'; $VAR3 = '1.28518884338707e-06'; $VAR4 = '0.188849079022568'; $VAR5 = '2.26690968497709e-06'; $VAR6 = '0.438939017666361'; $VAR7 = '0.0857489870993074';

Control Chi-Square-Based P-Values
$VAR1 = '5.38974260393018e-21'; $VAR2 = '2.88230382313125e-11'; $VAR3 = '1.01055994115002e-18'; $VAR4 = '1.77563596406316e-31'; $VAR5 = 0; $VAR6 = 0; $VAR7 = 0; $VAR8 = '0.000743975160904467'; $VAR9 = '1.24752037975246e-39'; $VAR10 = '3.87635670754786e-41'; $VAR11 = '6.39159227540956e-12'; $VAR12 = '5.05096813308015e-05'; $VAR13 = '1.519696491101e-06'; $VAR14 = '0.0183030717673731'; $VAR15 = 0; $VAR16 = '4.75734722879094e-23'; $VAR17 = '3.34398755492644e-34'; $VAR18 = 0; $VAR19 = '2.8446800457453e-16'; $VAR20 = '8.6288695815938e-14'; $VAR21 = '1.73969791074334e-45'; $VAR22 = '1.27368590034946e-10'; $VAR23 = '2.61073173183736e-12'; $VAR24 = '2.22436267905913e-41'; $VAR25 = '0.14097608436432'; $VAR26 = '2.27259006723311e-12';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Chi-Square-Based Method
$VAR1 = '0.191500239243009'; $VAR2 = '2.78047003018249e-25'; $VAR3 = '1.45622382264841e-06'; $VAR4 = '0.213981415395037'; $VAR5 = '2.5685936382361e-06'; $VAR6 = '0.497353721598666'; $VAR7 = '0.0971605989458269';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Chi-Square-Based Method
$VAR1 = 6; $VAR2 = '0.756902201941861'; $VAR3 = 25; $VAR4 = '0.243097798058139';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.43 Test, Chi-Square-Based Method
0.545454545454545
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.43 Test, Chi-Square-Based Method
0.0716417910447761
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Chi-Square-Based Method
0.883905013192612


Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.43 Test, Chi-Square-Based Method
0.142857142857143
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Chi-Square-Based Method
0.792452830188679

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.43 Test, Chi-Square-Based Method
0.1875
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Chi-Square-Based Method
0.744186046511628

Author Z-Score-Based P-Values
$VAR1 = '0.189141540591001'; $VAR2 = '0.00484568420146235'; $VAR3 = '0.0475379464199051'; $VAR4 = '0.191930618542426'; $VAR5 = '0.0698581336939889'; $VAR6 = '0.178392054031543'; $VAR7 = '0.188981273344321';

Control Z-Score-Based P-Values
$VAR1 = '0.027185503565873'; $VAR2 = '0.0507915870747712'; $VAR3 = '0.0145150313485325'; $VAR4 = '0.00786693767559947'; $VAR5 = '2.95330914206815e-06'; $VAR6 = '9.75332471207117e-07'; $VAR7 = '4.42890306063041e-05'; $VAR8 = '0.132226315772581'; $VAR9 = '0.00107724310049605'; $VAR10 = '0.0100984739487528'; $VAR11 = '0.0219553277356833'; $VAR12 = '0.0436827296332505'; $VAR13 = '0.0317408166687835'; $VAR14 = '0.0828248476874279'; $VAR15 = '0.00139009651703996'; $VAR16 = '0.0137367298340031'; $VAR17 = '3.01848374527028e-06'; $VAR18 = '0.000622785250379873'; $VAR19 = '0.000263549776021878'; $VAR20 = '0.0283692012386948'; $VAR21 = '0.00109391535239875'; $VAR22 = '0.020939574340014'; $VAR23 = '0.0350381593500224'; $VAR24 = '8.15079568976157e-06'; $VAR25 = '0.128657538409141'; $VAR26 = '0.0237294004044504';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.217232468273609'; $VAR2 = '0.00556535563932157'; $VAR3 = '0.054598188241393'; $VAR4 = '0.220435774568473'; $VAR5 = '0.0802333256032228'; $VAR6 = '0.204886489221685'; $VAR7 = '0.217048398452295';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 4; $VAR2 = '0.592091663712221'; $VAR3 = 8; $VAR4 = '0.407908336287779';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.19 Test, Z-Score-Based Method
0.857142857142857
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.19 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
1


Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.19 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
1


Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.19 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
1
Irenaeus is indicated as the most likely author.
testsize: 1496
$VAR1 = 74; $VAR2 = 109; $VAR3 = 24; $VAR4 = 71; $VAR5 = 91; $VAR6 = 5; $VAR7 = 5; $VAR8 = 15; $VAR9 = 14; $VAR10 = 18; $VAR11 = 19; $VAR12 = 17; $VAR13 = 4; $VAR14 = 4; $VAR15 = 5; $VAR16 = 5; $VAR17 = 11; $VAR18 = 2; $VAR19 = 15; $VAR20 = 4; $VAR21 = 2; $VAR22 = 3; $VAR23 = 40;

23 Words
$VAR1 = [ 'O', 'OI', 'H', 'AI', 'TO', 'TA' ]; $VAR2 = [ 'TOU', 'TWN', 'THS' ]; $VAR3 = [ 'TWi', 'TOIS', 'THi', 'TAIS' ]; $VAR4 = [ 'TON', 'TOUS', 'THN', 'TAS' ]; $VAR5 = [ 'KAI' ]; $VAR6 = [ 'TE' ]; $VAR7 = [ 'ALLA', 'ALL' ]; $VAR8 = [ 'GAR' ]; $VAR9 = [ 'EIS' ]; $VAR10 = [ 'EN' ]; $VAR11 = [ 'DIA', 'DI' ]; $VAR12 = [ 'EK', 'EC' ]; $VAR13 = [ 'KATA', 'KAT', 'KAQ' ]; $VAR14 = [ 'PROS' ]; $VAR15 = [ 'APO', 'AP' ]; $VAR16 = [ 'META', 'MET', 'MEQ' ]; $VAR17 = [ 'OU', 'OUK', 'OUX' ]; $VAR18 = [ 'POLUS', 'POLLOU', 'POLLWi', 'POLUN', 'POLLH', 'POLLHS', 'POLLHi', 'POLLHN', 'POLU', 'POLLOU', 'POLLWi', 'POLU', 'POLLOI', 'POLLWN', 'POLLOIS', 'POLLOUS', 'POLLAI', 'POLLWN', 'POLLAIS', 'POLLAS', 'POLLA', 'POLLWN', 'POLLOIS', 'POLLA' ]; $VAR19 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR20 = [ 'ODE', 'TOUDE', 'TWiDE', 'TONDE', 'OIDE', 'TWNDE', 'TOISDE', 'TOUSDE', 'HDE', 'THSDE', 'THiDE', 'THNDE', 'AIDE', 'TWNDE', 'TAISDE', 'TASDE', 'TODE', 'TOUDE', 'TODE', 'TADE', 'EKEINOS', 'EKEINOU', 'EKEINWi', 'EKEINON', 'EKEINOI', 'EKEINWN', 'EKEINOIS', 'EKEINOUS', 'EKEINH', 'EKEINHS', 'EKEINHi', 'EKEINHN', 'EKEINAI', 'EKEINAIS', 'EKEINAS', 'EKEINO', 'EKEINOU', 'EKEINWi', 'EKEINO', 'EKEINA', 'EKEINWN', 'EKEINOIS', 'EKEINA' ]; $VAR21 = [ 'PERI' ]; $VAR22 = [ 'TIS', 'TINOS', 'TINI', 'TINA', 'TINES', 'TINWN', 'TISI', 'TISIN', 'TINAS', 'TI', 'TINA' ]; $VAR23 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ];

Author Chi-Square-Based P-Values
$VAR1 = '0.128585198154302'; $VAR2 = 0; $VAR3 = '5.1199904743529e-05'; $VAR4 = '0.322601367993639'; $VAR5 = '7.23675146691969e-07'; $VAR6 = '0.00488474392739791'; $VAR7 = '0.0025135436601417';

Control Chi-Square-Based P-Values
$VAR1 = '4.06804356344907e-07'; $VAR2 = '2.32812109945295e-16'; $VAR3 = '2.06377465161763e-51'; $VAR4 = '1.32150863166511e-39'; $VAR5 = 0; $VAR6 = '9.15361331452747e-49'; $VAR7 = 0; $VAR8 = '0.328581210324402'; $VAR9 = '2.52892029285479e-41'; $VAR10 = '1.28180521781348e-11'; $VAR11 = '2.19783689322504e-06'; $VAR12 = '0.025626125481107'; $VAR13 = '5.24615058008337e-16'; $VAR14 = '0.00429005905269931'; $VAR15 = '1.26589419913224e-41'; $VAR16 = '5.55104523648815e-33'; $VAR17 = '1.90025070942155e-34'; $VAR18 = 0; $VAR19 = 0; $VAR20 = '1.25636692166003e-40'; $VAR21 = '1.89708582080793e-57'; $VAR22 = '9.70438958751707e-05'; $VAR23 = '3.44732462886991e-31'; $VAR24 = 0; $VAR25 = '0.44883144545292'; $VAR26 = '4.83557961880673e-07';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Chi-Square-Based Method
$VAR1 = '0.280363905631326'; $VAR2 = '0'; $VAR3 = '0.000111634974070827'; $VAR4 = '0.703391842847809'; $VAR5 = '1.57788294023868e-06'; $VAR6 = '0.0106505718010464'; $VAR7 = '0.00548046686280746';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Chi-Square-Based Method
$VAR1 = 4; $VAR2 = '0.418184658949547'; $VAR3 = 25; $VAR4 = '0.581815341050453';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.32 Test, Chi-Square-Based Method
0.6
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.32 Test, Chi-Square-Based Method
0.0626304801670146
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Chi-Square-Based Method
0.905482041587902

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.32 Test, Chi-Square-Based Method
0.176470588235294
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Chi-Square-Based Method
0.772727272727273

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.32 Test, Chi-Square-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Chi-Square-Based Method
1

Author Z-Score-Based P-Values
$VAR1 = '0.182854401766883'; $VAR2 = '0.0064360183150822'; $VAR3 = '0.063349147437644'; $VAR4 = '0.23460535022967'; $VAR5 = '0.0787441287239121'; $VAR6 = '0.0902797050185237'; $VAR7 = '0.158106918587784';

Control Z-Score-Based P-Values
$VAR1 = '0.0433558163416752'; $VAR2 = '0.0399063682264983'; $VAR3 = '0.0064197659468676'; $VAR4 = '0.00567816174162865'; $VAR5 = '6.94518501874106e-05'; $VAR6 = '0.000295164146635458'; $VAR7 = '0.00105509036862163'; $VAR8 = '0.214124359749043'; $VAR9 = '0.00219327938631219'; $VAR10 = '0.0302332314239487'; $VAR11 = '0.0445891418721447'; $VAR12 = '0.0970215681634584'; $VAR13 = '0.0387873274597366'; $VAR14 = '0.0770975831794432'; $VAR15 = '0.0121925121654343'; $VAR16 = '0.0134905297916449'; $VAR17 = '7.46978544112166e-05'; $VAR18 = '0.00408375751555584'; $VAR19 = '0.000622683340610001'; $VAR20 = '0.0243491636341886'; $VAR21 = '0.00226666680482889'; $VAR22 = '0.0384867944990566'; $VAR23 = '0.0203101085473562'; $VAR24 = '8.72402450049445e-05'; $VAR25 = '0.134170691747656'; $VAR26 = '0.0467880855537976';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.224533232616138'; $VAR2 = '0.00790300907989298'; $VAR3 = '0.07778860514271'; $VAR4 = '0.288080008832739'; $VAR5 = '0.0966926341454002'; $VAR6 = '0.110857566520511'; $VAR7 = '0.194144943662609';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 4; $VAR2 = '0.522821076948079'; $VAR3 = 8; $VAR4 = '0.477178923051921';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.23 Test, Z-Score-Based Method
0.2
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.23 Test, Z-Score-Based Method
0.00208768267223382
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
0.989669421487603


Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.23 Test, Z-Score-Based Method
0.0294117647058824
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
0.871794871794872


Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.23 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
1
Irenaeus is indicated as the most likely author.

(to be continued?...)
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
Bernard Muller
Posts: 3964
Joined: Tue Oct 15, 2013 6:02 pm
Contact:

Re: "Proving the Obvious" -- A Basic Stylometric Study

Post by Bernard Muller »

to Peter,
Peter wrote:
(to be continued?...)
You method is interesting but unfortunately disappointing in regards of not able to determine the existence or non existence of Q.
Maybe these tests would allow us to be more informed:
Comparing the Lukan material (without Luke's (allegedly) Q material) with Matthew's (allegedly) Q material.
Comparing the Matthean material (without Matthew's (allegedly) Q material) with Luke's (allegedly) Q material.
Note: the Lukan & Matthean material should not include sections close to corresponding ones in gMark).

I am aware my proposal might not be feasible, due to the difficulty to find adequate authors (in the authors group and in the control group).
If not, maybe your method can still be used in order to determine how different (stylometrywise) two texts are, as compare with other matchups.

For example, if you compare 1 John with John's gospel, the conlusion is likely to indicate these two texts were written by the same author, due to the relatively close stylometry between the two.
However if you compare John's gospel with Revelation, the conclusion is likely to indicate these two texts were not written by the same author, due to the relatively remote stylometry between the two.

Another thing which interests me:
According to your own study on 1 Clement http://peterkirby.com/a-study-in-1-clem ... l#more-461, can you compare the strictly Jewish part (22:1 to 41:2 minus the obvious Christian insertions in 24:1, 32:2, 33:4, 36:1 & 38:1) with the rest of 1 Clement?

Cordially, Bernard
I believe freedom of expression should not be curtailed
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: "Proving the Obvious" -- A Basic Stylometric Study

Post by Peter Kirby »

Bernard Muller wrote:You method is interesting but unfortunately disappointing in regards of not able to determine the existence or non existence of Q.
Maybe these tests would allow us to be more informed:
Comparing the Lukan material (without Luke's (allegedly) Q material) with Matthew's (allegedly) Q material.
Comparing the Matthean material (without Matthew's (allegedly) Q material) with Luke's (allegedly) Q material

I am aware my proposal might not be feasible, due to the difficulty to find adequate authors (in the authors group and in the control group).
If not, maybe your method can still be used in order to determine how different (stylometrywise) two texts are, as compare with other matchups.
Maybe it's "disappointing," but I don't think it should be surprising. The question of "determin[ing] the existence or non existence of Q" is entirely a question of human interpretation/analysis that is beyond the scope of what stylometry, strictly, can do--or, can ever do, in principle.

If the "double tradition" in Matthew could be identified as Matthean in style and uniformly so (i.e., if "Matthew" is the closest match for all the samples of the double tradition in Matthew), that might be an indication of sorts against "Q." But of course that would just be an interpretation of that result.

All that I've said so far is that my post on Luke, by itself, did not reveal any seemingly-conclusive information. I haven't looked at Matthew in this way. I haven't actually made any effort to attempt to verify the existence or non-existence of Q (with stylometry), anyway. It was not my focus, by any means.
For example, if you compare 1 John with John's gospel, the conlusion is likely to indicate these two texts were written by the same author, due to the relatively close stylometry between the two.
However if you compare John's gospel with Revelation, the conclusion is likely to indicate these two texts were not written by the same author, due to the relatively remote stylometry between the two.
Yes, that's been a conclusion to a number of studies.
Another thing which interests me:
According to your own study on 1 Clement http://peterkirby.com/a-study-in-1-clem ... l#more-461, can you compare the strictly Jewish part (22:1 to 41:2 minus the obvious Christian insertions in 24:1, 32:2, 33:4, 36:1 & 38:1) with the rest of 1 Clement?
That interests me also.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Tenorikuma
Posts: 374
Joined: Thu Nov 14, 2013 6:40 am

Re: "Proving the Obvious" -- A Basic Stylometric Study

Post by Tenorikuma »

It would be interesting to compare some of the contested portions of Romans with the rest of the book.
Last edited by Tenorikuma on Wed May 27, 2015 7:18 pm, edited 1 time in total.
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: "Proving the Obvious" -- A Basic Stylometric Study

Post by Peter Kirby »

Tenorikuma wrote:It would be interesting to compare some of the contested portions of Romans with each the rest of the book.
Lots of interesting things to try -- right now I'm still working with the "less-interesting" things out there, which I find interesting for two bits:

(1) the foundations of knowledge -- we don't really dig up these conclusions often, so perhaps it'd be nice to have some 'flying buttress' support here...

(2) the accuracy of the 'stylometer' program, its limits, and its operation (i.e., don't just take it for granted that the darn thing works!)

In a certain sense these two aims are in contradiction with each other, but it's more or less the way we do anything empirical (the 'good' results [known from other methods] confirm the tests as worthwhile tests, & the 'good' tests [known from other applications] confirm the results as worthwhile results, in a virtuous circle....)

Technically, of course, I've given anyone who wants to give it a go the means with which to set up their own 'experiments' -- nothing would give you a better feel for the limitations of this approach faster (anyone else is likewise recommended to start with an 'un'-interesting problem).

For example, it is very important to get the list of words right if you don't want to get nonsensical 'results'. To do that, you should start with some undisputed text of an author (a single work can be split up for this purpose) and give it multiple runs to find out what words discriminate best for that author. That's not something you'd necessarily know without doing the 'un-interesting' tests first. And it should temper our enthusiasm if/when we're not able to get multiple 'test readings' in the first place that show that a particular list of words is effectively identifying the works of a particular author.

It's generally slow-going stuff, and requires vigilance against taking a wrong step.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
Post Reply