Clement of Alexandria -- A Basic Stylometric Study

Discussion about the New Testament, apocrypha, gnostics, church fathers, Christian origins, historical Jesus or otherwise, etc.
User avatar
Tenorikuma
Posts: 374
Joined: Thu Nov 14, 2013 6:40 am

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Tenorikuma »

Very interesting.
User avatar
Peter Kirby
Site Admin
Posts: 8020
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Peter Kirby »

Same thing again...

This time, with the 'fragments' of Clement that are contained in the TLG (4365 words, from various sources, mostly church fathers and catenae).

I didn't do an excellent job of separating the introductory words or the paraphrases from the actual, explicit quotes (although I did try).

In any case, it seems to identify 'Clement' as the most likely author when a large sample is taken of these fragments. This cannot be said to authenticate all the fragments, however; it merely suggests the probability that, at least, many or most of them were from Clement.
testsize: 4365
$VAR1 = 88; $VAR2 = 29; $VAR3 = 93; $VAR4 = 251; $VAR5 = 99; $VAR6 = 21; $VAR7 = 25; $VAR8 = 31; $VAR9 = 36; $VAR10 = 71; $VAR11 = 19; $VAR12 = 31; $VAR13 = 9; $VAR14 = 7; $VAR15 = 7; $VAR16 = 15; $VAR17 = 22; $VAR18 = 7; $VAR19 = 16; $VAR20 = 23;

20 Words
$VAR1 = [ 'AUTOS', 'AUTOU', 'AUTWi', 'AUTON', 'AUTOI', 'AUTWN', 'AUTOIS', 'AUTOUS', 'AUTH', 'AUTHS', 'AUTHi', 'AUTHN', 'AUTAI', 'AUTWN', 'AUTAIS', 'AUTAS', 'AUTO', 'AUTA' ]; $VAR2 = [ 'TIS', 'TINOS', 'TINI', 'TINA', 'TINES', 'TINWN', 'TISI', 'TISIN', 'TINAS', 'TI', 'TINA' ]; $VAR3 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ]; $VAR4 = [ 'KAI' ]; $VAR5 = [ 'DE', 'D' ]; $VAR6 = [ 'MEN' ]; $VAR7 = [ 'ALLA', 'ALL' ]; $VAR8 = [ 'GAR' ]; $VAR9 = [ 'EIS' ]; $VAR10 = [ 'EN' ]; $VAR11 = [ 'EK', 'EC' ]; $VAR12 = [ 'PROS' ]; $VAR13 = [ 'OUN' ]; $VAR14 = [ 'INA' ]; $VAR15 = [ 'OTI' ]; $VAR16 = [ 'APO', 'AP' ]; $VAR17 = [ 'PERI' ]; $VAR18 = [ 'POLUS', 'POLLOU', 'POLLWi', 'POLUN', 'POLLH', 'POLLHS', 'POLLHi', 'POLLHN', 'POLU', 'POLLOU', 'POLLWi', 'POLU', 'POLLOI', 'POLLWN', 'POLLOIS', 'POLLOUS', 'POLLAI', 'POLLWN', 'POLLAIS', 'POLLAS', 'POLLA', 'POLLWN', 'POLLOIS', 'POLLA' ]; $VAR19 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR20 = [ 'EPI', 'EP' ];

Author Z-Score-Based P-Values
$VAR1 = '0.145413047886394'; $VAR2 = '0.000901138623231966'; $VAR3 = '0.0913020235022347';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0121911830943714'; $VAR2 = '0.00121881348131204'; $VAR3 = '0.118188400968307'; $VAR4 = '0.0498318645244918'; $VAR5 = '0.0131940811518082'; $VAR6 = '5.91405231787019e-12'; $VAR7 = '4.34609970100557e-05'; $VAR8 = '0'; $VAR9 = '1.95087693605705e-185'; $VAR10 = '1.68934449840894e-65'; $VAR11 = '1.09867528822474e-11'; $VAR12 = '4.4488774910846e-20'; $VAR13 = '1.47686290317326e-08'; $VAR14 = '0.00169556733417891'; $VAR15 = '0.00214808635364086'; $VAR16 = '0.0394381736188443'; $VAR17 = '0.0169018091435257'; $VAR18 = '5.30710670175691e-09'; $VAR19 = '0.000492395650257427'; $VAR20 = '1.39013748628716e-10'; $VAR21 = '5.18747184360314e-08'; $VAR22 = '9.64828528134546e-05'; $VAR23 = '0.0319428063927732'; $VAR24 = '0.000333708459071205'; $VAR25 = '0.0180331392672953'; $VAR26 = '0.0591573593950859'; $VAR27 = '4.31905406628548e-07'; $VAR28 = '0.028537343187936'; $VAR29 = '0.0235808410796549';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.611966026556588'; $VAR2 = '0.00379241223983408'; $VAR3 = '0.384241561203578';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.551639789986688'; $VAR3 = 3; $VAR4 = '0.448360210013312';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.1 Test, Z-Score-Based Method
1
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.1 Test, Z-Score-Based Method
0.0463576158940397
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
0.955696202531646

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.1 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
1

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.1 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
1
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Peter Kirby
Site Admin
Posts: 8020
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Peter Kirby »

The Letter to Theodore....

As shown from the small frequencies of the 'words' measured, it is very difficult to use Clement's habits of style to identify a text only 749 words long.

This is even though I didn't cull out the incipit and the long quote of the 'secret gospel.'

If using a 'z-scored p-value < 0.1' test, only 10 of 32 candidates are excluded.

If using a 'z-scored p-value < 0.05' test, only 3 of 32 candidates are excluded.

With this method, at least, there is a huge difference between a 750-word sample and a 3000-word sample.
testsize: 749
$VAR1 = 14; $VAR2 = 3; $VAR3 = 15; $VAR4 = 45; $VAR5 = 15; $VAR6 = 3; $VAR7 = 4; $VAR8 = 9; $VAR9 = 0; $VAR10 = 5; $VAR11 = 7; $VAR12 = 0; $VAR13 = 3; $VAR14 = 1; $VAR15 = 0; $VAR16 = 2; $VAR17 = 3; $VAR18 = 1; $VAR19 = 4; $VAR20 = 2;

20 Words
$VAR1 = [ 'AUTOS', 'AUTOU', 'AUTWi', 'AUTON', 'AUTOI', 'AUTWN', 'AUTOIS', 'AUTOUS', 'AUTH', 'AUTHS', 'AUTHi', 'AUTHN', 'AUTAI', 'AUTWN', 'AUTAIS', 'AUTAS', 'AUTO', 'AUTA' ]; $VAR2 = [ 'TIS', 'TINOS', 'TINI', 'TINA', 'TINES', 'TINWN', 'TISI', 'TISIN', 'TINAS', 'TI', 'TINA' ]; $VAR3 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ]; $VAR4 = [ 'KAI' ]; $VAR5 = [ 'DE', 'D' ]; $VAR6 = [ 'MEN' ]; $VAR7 = [ 'ALLA', 'ALL' ]; $VAR8 = [ 'GAR' ]; $VAR9 = [ 'EIS' ]; $VAR10 = [ 'EN' ]; $VAR11 = [ 'EK', 'EC' ]; $VAR12 = [ 'PROS' ]; $VAR13 = [ 'OUN' ]; $VAR14 = [ 'INA' ]; $VAR15 = [ 'OTI' ]; $VAR16 = [ 'APO', 'AP' ]; $VAR17 = [ 'PERI' ]; $VAR18 = [ 'POLUS', 'POLLOU', 'POLLWi', 'POLUN', 'POLLH', 'POLLHS', 'POLLHi', 'POLLHN', 'POLU', 'POLLOU', 'POLLWi', 'POLU', 'POLLOI', 'POLLWN', 'POLLOIS', 'POLLOUS', 'POLLAI', 'POLLWN', 'POLLAIS', 'POLLAS', 'POLLA', 'POLLWN', 'POLLOIS', 'POLLA' ]; $VAR19 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR20 = [ 'EPI', 'EP' ];

Author Z-Score-Based P-Values
$VAR1 = '0.239955979770313'; $VAR2 = '0.15627245207971'; $VAR3 = '0.21868928826961';

Control Z-Score-Based P-Values
$VAR1 = '0.097729840149245'; $VAR2 = '0.0903563825906666'; $VAR3 = '0.192363798619605'; $VAR4 = '0.265169803857623'; $VAR5 = '0.228744745288692'; $VAR6 = '0.036703429649263'; $VAR7 = '0.0901719041523447'; $VAR8 = '0.0554698680497574'; $VAR9 = '0.109323876919529'; $VAR10 = '0.000566835398378018'; $VAR11 = '0.131945576748502'; $VAR12 = '0.00783067630839334'; $VAR13 = '0.111350069835556'; $VAR14 = '0.104457453482461'; $VAR15 = '0.118129003101242'; $VAR16 = '0.122765927768801'; $VAR17 = '0.153215556560225'; $VAR18 = '0.0859315683074529'; $VAR19 = '0.0629689701000199'; $VAR20 = '0.119123175545919'; $VAR21 = '0.0856285914811254'; $VAR22 = '0.175958863093962'; $VAR23 = '0.13824411417826'; $VAR24 = '0.110578942734896'; $VAR25 = '0.178642716134681'; $VAR26 = '0.240643005597686'; $VAR27 = '0.13530828297217'; $VAR28 = '0.208473672498232'; $VAR29 = '0.151730655959346';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.390224532354717'; $VAR2 = '0.254135548491442'; $VAR3 = '0.35563991915384';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.475042034177885'; $VAR3 = 4; $VAR4 = '0.524957965822115';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.2 Test, Z-Score-Based Method
0.695402298850575
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.2 Test, Z-Score-Based Method
0.0919540229885057
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
0.883211678832117

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.2 Test, Z-Score-Based Method
0.0833333333333333
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
0.892988929889299

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.2 Test, Z-Score-Based Method
0.0357142857142857
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
0.95115103874228

Author Chi-Square-Based P-Values
$VAR1 = '0.897126717178744'; $VAR2 = '0.876686010922532'; $VAR3 = '0.766954128799045';

Control Chi-Square-Based P-Values
$VAR1 = '0.137898770544387'; $VAR2 = '0.263214501782374'; $VAR3 = '0.97251600690767'; $VAR4 = '0.996383635782013'; $VAR5 = '0.995727137880758'; $VAR6 = '0.000948092883388932'; $VAR7 = '0.253210360195565'; $VAR8 = '0.00852822453421988'; $VAR9 = '0.65938916375342'; $VAR10 = 0; $VAR11 = '0.346606811740825'; $VAR12 = '1.0011159413525e-12'; $VAR13 = '0.00377297557020349'; $VAR14 = '0.169854249175826'; $VAR15 = '0.599467656898274'; $VAR16 = '0.493172636244664'; $VAR17 = '0.912577608896469'; $VAR18 = '0.277004254454816'; $VAR19 = '0.00103826061119019'; $VAR20 = '0.149481145224522'; $VAR21 = '0.000251868655303691'; $VAR22 = '0.851018068030414'; $VAR23 = '0.697616764950275'; $VAR24 = '0.442629632622023'; $VAR25 = '0.993407920946543'; $VAR26 = '0.98246524429608'; $VAR27 = '0.118313132250566'; $VAR28 = '0.998364840186754'; $VAR29 = '0.8885746060188';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Chi-Square-Based Method
$VAR1 = '0.353092891912648'; $VAR2 = '0.345047798676054'; $VAR3 = '0.301859309411298';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Chi-Square-Based Method
$VAR1 = 1; $VAR2 = '0.473295021385187'; $VAR3 = 28; $VAR4 = '0.526704978614813';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.7 Test, Chi-Square-Based Method
0.35632183908046
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.7 Test, Chi-Square-Based Method
0.0188087774294671
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Chi-Square-Based Method
0.949860724233983

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.7 Test, Chi-Square-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Chi-Square-Based Method
1

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.7 Test, Chi-Square-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Chi-Square-Based Method
1
The situation does not improve even if you remove the incipit and the long quote of the 'secret gospel' and add some more 'words' to be counted.
testsize: 585
$VAR1 = 9; $VAR2 = 3; $VAR3 = 11; $VAR4 = 31; $VAR5 = 12; $VAR6 = 3; $VAR7 = 4; $VAR8 = 7; $VAR9 = 0; $VAR10 = 5; $VAR11 = 3; $VAR12 = 0; $VAR13 = 3; $VAR14 = 0; $VAR15 = 0; $VAR16 = 1; $VAR17 = 3; $VAR18 = 1; $VAR19 = 4; $VAR20 = 1; $VAR21 = 40; $VAR22 = 30; $VAR23 = 0; $VAR24 = 11; $VAR25 = 3; $VAR26 = 1; $VAR27 = 7; $VAR28 = 3;

28 Words
$VAR1 = [ 'AUTOS', 'AUTOU', 'AUTWi', 'AUTON', 'AUTOI', 'AUTWN', 'AUTOIS', 'AUTOUS', 'AUTH', 'AUTHS', 'AUTHi', 'AUTHN', 'AUTAI', 'AUTWN', 'AUTAIS', 'AUTAS', 'AUTO', 'AUTA' ]; $VAR2 = [ 'TIS', 'TINOS', 'TINI', 'TINA', 'TINES', 'TINWN', 'TISI', 'TISIN', 'TINAS', 'TI', 'TINA' ]; $VAR3 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ]; $VAR4 = [ 'KAI' ]; $VAR5 = [ 'DE', 'D' ]; $VAR6 = [ 'MEN' ]; $VAR7 = [ 'ALLA', 'ALL' ]; $VAR8 = [ 'GAR' ]; $VAR9 = [ 'EIS' ]; $VAR10 = [ 'EN' ]; $VAR11 = [ 'EK', 'EC' ]; $VAR12 = [ 'PROS' ]; $VAR13 = [ 'OUN' ]; $VAR14 = [ 'INA' ]; $VAR15 = [ 'OTI' ]; $VAR16 = [ 'APO', 'AP' ]; $VAR17 = [ 'PERI' ]; $VAR18 = [ 'POLUS', 'POLLOU', 'POLLWi', 'POLUN', 'POLLH', 'POLLHS', 'POLLHi', 'POLLHN', 'POLU', 'POLLOU', 'POLLWi', 'POLU', 'POLLOI', 'POLLWN', 'POLLOIS', 'POLLOUS', 'POLLAI', 'POLLWN', 'POLLAIS', 'POLLAS', 'POLLA', 'POLLWN', 'POLLOIS', 'POLLA' ]; $VAR19 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR20 = [ 'EPI', 'EP' ]; $VAR21 = [ 'O', 'OI', 'H', 'AI', 'TO', 'TA' ]; $VAR22 = [ 'TOU', 'TWN', 'THS' ]; $VAR23 = [ 'TWi', 'TOIS', 'THi', 'TAIS' ]; $VAR24 = [ 'TON', 'TOUS', 'THN', 'TAS' ]; $VAR25 = [ 'OUTOS', 'TOUTOU', 'TOUTWi', 'TOUTON', 'AUTOI', 'TOUTWN', 'TOUTOIS', 'TOUTOUS', 'AUTH', 'TAUTHS', 'TAUTHi', 'TAUTHN', 'AUTAI', 'TAUTAIS', 'TAUTAS', 'TOUTO', 'TOUTO', 'TAUTA' ]; $VAR26 = [ 'TE' ]; $VAR27 = [ 'KATA', 'KAT', 'KAQ' ]; $VAR28 = [ 'DIA', 'DI' ];

Author Z-Score-Based P-Values
$VAR1 = '0.238966466484185'; $VAR2 = '0.180860494019055'; $VAR3 = '0.178908540891056';

Control Z-Score-Based P-Values
$VAR1 = '0.109399101885252'; $VAR2 = '0.0870618496838642'; $VAR3 = '0.157635860502871'; $VAR4 = '0.271300166643239'; $VAR5 = '0.168735908605299'; $VAR6 = '0.0330976885518663'; $VAR7 = '0.0439533347287748'; $VAR8 = '0.11798577811795'; $VAR9 = '0.101263307889919'; $VAR10 = '0.00142063415628147'; $VAR11 = '0.126728996972072'; $VAR12 = '0.0124400065499432'; $VAR13 = '0.0971078780086916'; $VAR14 = '0.116118473265528'; $VAR15 = '0.112225272240502'; $VAR16 = '0.0945957842924575'; $VAR17 = '0.182692612311113'; $VAR18 = '0.0750775225264509'; $VAR19 = '0.0611908075683004'; $VAR20 = '0.113578531394736'; $VAR21 = '0.0847478969047339'; $VAR22 = '0.135621238987263'; $VAR23 = '0.0901770694789485'; $VAR24 = '0.124684537842573'; $VAR25 = '0.18824496566581'; $VAR26 = '0.205067999901377'; $VAR27 = '0.0966427618490444'; $VAR28 = '0.200584311062807'; $VAR29 = '0.104617619995548';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.399118585632045'; $VAR2 = '0.302070770144545'; $VAR3 = '0.29881064422341';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.468316858226766'; $VAR3 = 4; $VAR4 = '0.531683141773234';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.2 Test, Z-Score-Based Method
0.757847533632287
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.2 Test, Z-Score-Based Method
0.0943856794141578
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
0.889249001365763

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.2 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
1

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.2 Test, Z-Score-Based Method
0.0555555555555556
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
0.931699846860643
The most that can be said is that this doesn't disprove Clementine authorship.

It's possible that more advanced stylometric techniques could provide useful information. Some studies have had good results with English texts of 250 words.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Peter Kirby
Site Admin
Posts: 8020
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Peter Kirby »

The 'Hymnus Christi servatoris' and the 'Eclogae propheticae' were also too short to be tested with this method.

There are no other works attributed to Clement of Alexandria in the TLG.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Peter Kirby
Site Admin
Posts: 8020
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Peter Kirby »

Just to complete the circle here, these are the results of using the samples from the 'Protrepticus' and the 'Paedagogus', compared against the 'Stromata' and using the 20 words used in all the other tests.

Accuracy has improved. When compared with the controls included, accuracy improves from 80% to 100% when using the 20-word list and comparing against the 'Stromata.'
Author Z-Score-Based P-Values
$VAR1 = '0.175632194388122'; $VAR2 = '0.0273226233915555'; $VAR3 = '0.0581206776989506';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.02782059302406'; $VAR2 = '0.00143590353354502'; $VAR3 = '0.0562975433097471'; $VAR4 = '0.137281471036673'; $VAR5 = '0.0513706697595692'; $VAR6 = '7.3655989200826e-09'; $VAR7 = '5.53219519454619e-06'; $VAR8 = '5.57131947192697e-28'; $VAR9 = '4.69322834774578e-184'; $VAR10 = '7.34148182893381e-28'; $VAR11 = '0.0450709140156383'; $VAR12 = '2.39311758232927e-11'; $VAR13 = '0.0142592416277771'; $VAR14 = '0.0342824655640593'; $VAR15 = '0.0809509758911259'; $VAR16 = '0.0307824505324834'; $VAR17 = '0.0794659503371223'; $VAR18 = '0.00240605043289266'; $VAR19 = '0.000381490029783153'; $VAR20 = '0.000180597952357559'; $VAR21 = '2.06575862837364e-05'; $VAR22 = '0.00371250716450203'; $VAR23 = '0.0349457220931891'; $VAR24 = '0.0149668300751288'; $VAR25 = '0.027841794237935'; $VAR26 = '0.101402935447586'; $VAR27 = '0.000445600243109259'; $VAR28 = '0.0803367537675786'; $VAR29 = '0.0237382690533745';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.672725695937631'; $VAR2 = '0.104654109116848'; $VAR3 = '0.222620194945521';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.56128003917532'; $VAR3 = 4; $VAR4 = '0.43871996082468';
Author Z-Score-Based P-Values
$VAR1 = '0.107237611670189'; $VAR2 = '0.0286922493008972'; $VAR3 = '0.047443351245165';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0066
6168380760475'; $VAR2 = '0.00922277109137771'; $VAR3 = '0.0946030528784022'; $VAR4 = '0.0756709010836628'; $VAR5 = '0.00448245294473958'; $VAR6 = '1.46379030258297e-10'; $VAR7 = '8.89074884869157e-07'; $VAR8 = '5.49790383615881e-27'; $VAR9 = '8.45603081909304e-215'; $VAR10 = '2.37539157284602e-29'; $VAR11 = '0.00459961966469155'; $VAR12 = '6.80016274498375e-15'; $VAR13 = '0.0121051010844485'; $VAR14 = '0.00771373778319094'; $VAR15 = '0.0209650467794752'; $VAR16 = '0.0243188214625438'; $VAR17 = '0.0714454958850729'; $VAR18 = '0.000151037766570767'; $VAR19 = '0.0028847489834993'; $VAR20 = '9.7450111833517e-07'; $VAR21 = '4.04850190277198e-06'; $VAR22 = '0.000598177410150459'; $VAR23 = '0.0433457634106286'; $VAR24 = '0.00469730414614475'; $VAR25 = '0.0264803134684096'; $VAR26 = '0.064698057857405'; $VAR27 = '8.38764669458983e-06'; $VAR28 = '0.0624867640809138'; $VAR29 = '0.0386274976616659';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.584805219770727'; $VAR2 = '0.156469142652421'; $VAR3 = '0.258725637576853';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.531298348179846'; $VAR3 = 3; $VAR4 = '0.468701651820154';
Author Z-Score-Based P-Values
$VAR1 = '0.0824682709412135'; $VAR2 = '0.00415072678535048'; $VAR3 = '0.0187909211667257';
Good match. Z-Score-Based P-Value > 0.05.

Control Z-Score-Based P-Values
$VAR1 = '0.0389219731744169'; $VAR2 = '0.0003349616659174'; $VAR3 = '0.0342253859865855'; $VAR4 = '0.0183701688572872'; $VAR5 = '0.00597865831662005'; $VAR6 = '3.92333131016311e-11'; $VAR7 = '2.13610617664352e-10'; $VAR8 = '3.76665695464297e-15'; $VAR9 = '1.2732853591179e-257'; $VAR10 = '9.28539727412726e-52'; $VAR11 = '3.23230923980392e-05'; $VAR12 = '1.53666755200642e-23'; $VAR13 = '0.00119991028336151'; $VAR14 = '0.00914552955304082'; $VAR15 = '0.0270038887302336'; $VAR16 = '0.028166406661062'; $VAR17 = '0.053595433790182'; $VAR18 = '2.10481254546187e-05'; $VAR19 = '0.00265770638746398'; $VAR20 = '2.4750826562569e-08'; $VAR21 = '5.77367602095181e-06'; $VAR22 = '5.36635825253591e-05'; $VAR23 = '0.0406304006242715'; $VAR24 = '0.000412424312074317'; $VAR25 = '0.013711113765813'; $VAR26 = '0.035887261186336'; $VAR27 = '5.31144381825819e-07'; $VAR28 = '0.0250959668833924'; $VAR29 = '0.018836078848505';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.782357787645194'; $VAR2 = '0.0393770038809385'; $VAR3 = '0.178265208473867';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.606100437320995'; $VAR3 = 17; $VAR4 = '0.393899562679005';
Author Z-Score-Based P-Values
$VAR1 = '0.110505697189735'; $VAR2 = '0.00748399482117353'; $VAR3 = '0.0611989383310658';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0267588410461724'; $VAR2 = '0.00148898973648727'; $VAR3 = '0.0708441011321246'; $VAR4 = '0.0756945198806317'; $VAR5 = '0.0011920579033652'; $VAR6 = '1.43073955047932e-08'; $VAR7 = '9.97631933544808e-06'; $VAR8 = '7.53174870767207e-10'; $VAR9 = '2.05888736666225e-206'; $VAR10 = '4.04857688592491e-107'; $VAR11 = '1.74483448605373e-05'; $VAR12 = '7.98025186440753e-15'; $VAR13 = '0.0015908691420342'; $VAR14 = '0.00623029755574913'; $VAR15 = '0.00363462025473137'; $VAR16 = '0.00946516184224697'; $VAR17 = '0.0285526958293264'; $VAR18 = '6.566063003799e-08'; $VAR19 = '0.000508312911662489'; $VAR20 = '6.14159775089668e-07'; $VAR21 = '1.76996420171829e-06'; $VAR22 = '9.78404536893078e-05'; $VAR23 = '0.0174421810615398'; $VAR24 = '0.00665264480319697'; $VAR25 = '0.0219566166244337'; $VAR26 = '0.0774009201533712'; $VAR27 = '9.94921262971493e-05'; $VAR28 = '0.0420291584419999'; $VAR29 = '0.0472926066675885';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.616700384275718'; $VAR2 = '0.0417660138753815'; $VAR3 = '0.3415336018489';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.588088374705604'; $VAR3 = 26; $VAR4 = '0.411911625294396';
Author Z-Score-Based P-Values
$VAR1 = '0.164792909897107'; $VAR2 = '0.00760488018849116'; $VAR3 = '0.0673888011239041';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.023468038624533'; $VAR2 = '0.00749130426931931'; $VAR3 = '0.0572563341476108'; $VAR4 = '0.0680963477880058'; $VAR5 = '0.0108787884338907'; $VAR6 = '7.31342952576999e-09'; $VAR7 = '1.91002559491324e-07'; $VAR8 = '2.35156862199798e-08'; $VAR9 = '6.86730217887237e-234'; $VAR10 = '3.80292747137152e-87'; $VAR11 = '0.0268106999086168'; $VAR12 = '3.40164364827471e-15'; $VAR13 = '0.00196760327485455'; $VAR14 = '0.00969108704187161'; $VAR15 = '0.0254228285349839'; $VAR16 = '0.015421520521892'; $VAR17 = '0.0633911621733293'; $VAR18 = '3.02657136991658e-06'; $VAR19 = '6.1521085403499e-05'; $VAR20 = '1.99848870657279e-08'; $VAR21 = '8.7881056958383e-06'; $VAR22 = '0.000442141462464159'; $VAR23 = '0.0208185841403742'; $VAR24 = '6.40420543723858e-05'; $VAR25 = '0.00883412619936611'; $VAR26 = '0.0527926556411981'; $VAR27 = '0.00153265912802022'; $VAR28 = '0.0375380050831736'; $VAR29 = '0.0241730962815983';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.687248227959197'; $VAR2 = '0.0317152020475022'; $VAR3 = '0.2810365699933';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.707602023103709'; $VAR3 = 4; $VAR4 = '0.292397976896291';
Author Z-Score-Based P-Values
$VAR1 = '0.111189725817848'; $VAR2 = '0.000845600059691961'; $VAR3 = '0.0422350494237638';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0229339536983116'; $VAR2 = '0.00888533667285483'; $VAR3 = '0.0394333192376367'; $VAR4 = '0.0657086154664528'; $VAR5 = '0.0122378536651146'; $VAR6 = '2.61569854993523e-11'; $VAR7 = '6.69161351099993e-08'; $VAR8 = '2.83756722376003e-319'; $VAR9 = '3.22388100936173e-167'; $VAR10 = '1.11451195468968e-71'; $VAR11 = '0.020186824935264'; $VAR12 = '2.77423960941059e-14'; $VAR13 = '0.00223621536974783'; $VAR14 = '0.0285484977309586'; $VAR15 = '0.0421428132162825'; $VAR16 = '0.014833802838586'; $VAR17 = '0.0681405557451809'; $VAR18 = '0.000421100408442739'; $VAR19 = '0.0001080681088237'; $VAR20 = '8.19479176409627e-09'; $VAR21 = '8.13360135333622e-07'; $VAR22 = '5.57483226888127e-05'; $VAR23 = '0.0320242461619289'; $VAR24 = '1.0273410564072e-06'; $VAR25 = '0.025884931004085'; $VAR26 = '0.0280163611294132'; $VAR27 = '2.66974262965076e-05'; $VAR28 = '0.0202517252924613'; $VAR29 = '0.00543958627510071';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.720745804894066'; $VAR2 = '0.00548128607349551'; $VAR3 = '0.273772909032438';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.620027609663726'; $VAR3 = 17; $VAR4 = '0.379972390336273';
Author Z-Score-Based P-Values
$VAR1 = '0.0642111781138632'; $VAR2 = '0.00171045823223661'; $VAR3 = '0.0262683792041696';
Good match. Z-Score-Based P-Value > 0.05.

Control Z-Score-Based P-Values
$VAR1 = '0.00318373458339036'; $VAR2 = '0.00173855032088974'; $VAR3 = '0.0406571725602829'; $VAR4 = '0.0626892003297425'; $VAR5 = '0.0204993894071774'; $VAR6 = '8.6732028754175e-11'; $VAR7 = '2.20942891016228e-08'; $VAR8 = '0'; $VAR9 = '6.91868498042178e-198'; $VAR10 = '4.57198095250288e-95'; $VAR11 = '0.0003071947532923'; $VAR12 = '3.18816436108696e-16'; $VAR13 = '0.000386262698552709'; $VAR14 = '0.0156242059972711'; $VAR15 = '0.0231832109020983'; $VAR16 = '0.0202811440292099'; $VAR17 = '0.0492366831245109'; $VAR18 = '0.000572122742976243'; $VAR19 = '9.85928119304763e-05'; $VAR20 = '2.09728807677249e-10'; $VAR21 = '1.52370195834476e-06'; $VAR22 = '2.42475148711906e-05'; $VAR23 = '0.0207737786880624'; $VAR24 = '7.97473750372443e-08'; $VAR25 = '0.0148983622868027'; $VAR26 = '0.0232820621036027'; $VAR27 = '5.59564607651031e-06'; $VAR28 = '0.0211146789481579'; $VAR29 = '0.00915442903199711';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.69650902790932'; $VAR2 = '0.0185536169185689'; $VAR3 = '0.284937355172111';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.505996742495126'; $VAR3 = 4; $VAR4 = '0.494003257504874';
Author Z-Score-Based P-Values
$VAR1 = '0.114069128558597'; $VAR2 = '0.000211773752275666'; $VAR3 = '0.0429219384827756';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.023981006118095'; $VAR2 = '0.00450192191987152'; $VAR3 = '0.0677401614979971'; $VAR4 = '0.0814486772075348'; $VAR5 = '0.0227528913120345'; $VAR6 = '1.49939287563297e-14'; $VAR7 = '7.85235818813492e-09'; $VAR8 = '2.71736105212686e-322'; $VAR9 = '4.30014626484274e-191'; $VAR10 = '1.15051097141332e-260'; $VAR11 = '0.0108836335407889'; $VAR12 = '5.14375463919645e-15'; $VAR13 = '0.000425339743850498'; $VAR14 = '0.0132251701372594'; $VAR15 = '0.0154913736920536'; $VAR16 = '0.0187808000880381'; $VAR17 = '0.075004461663421'; $VAR18 = '2.24403725348054e-07'; $VAR19 = '0.000611657482548781'; $VAR20 = '1.50520063005187e-06'; $VAR21 = '4.48382733360271e-05'; $VAR22 = '0.000775985876828896'; $VAR23 = '0.0439394981632955'; $VAR24 = '7.43790486546224e-05'; $VAR25 = '0.0183163824989847'; $VAR26 = '0.0138115026410691'; $VAR27 = '0.000424127477336679'; $VAR28 = '0.033727989915962'; $VAR29 = '0.0129456845039793';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.725617476011961'; $VAR2 = '0.00134713692962871'; $VAR3 = '0.273035387058411';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.583420666530192'; $VAR3 = 4; $VAR4 = '0.416579333469808';
Author Z-Score-Based P-Values
$VAR1 = '0.112738369663818'; $VAR2 = '0.000449551489882513'; $VAR3 = '0.0488870564633003';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0196374267786769'; $VAR2 = '0.0167984666467129'; $VAR3 = '0.0303305891008324'; $VAR4 = '0.0444762499342243'; $VAR5 = '0.0148277488813066'; $VAR6 = '1.53154138579379e-19'; $VAR7 = '1.39407585706268e-09'; $VAR8 = '0'; $VAR9 = '3.48769270905702e-208'; $VAR10 = '1.51179128830293e-138'; $VAR11 = '0.0144677258680161'; $VAR12 = '3.34957490776342e-18'; $VAR13 = '0.000884156524790406'; $VAR14 = '0.00964426731428726'; $VAR15 = '0.0191931377010463'; $VAR16 = '0.0342878507255054'; $VAR17 = '0.0919495746233905'; $VAR18 = '1.21511880282215e-18'; $VAR19 = '0.000362061302497777'; $VAR20 = '2.30204157673836e-10'; $VAR21 = '1.17598339232554e-06'; $VAR22 = '3.75817879417231e-05'; $VAR23 = '0.028263450053101'; $VAR24 = '8.41715298929864e-07'; $VAR25 = '0.0114909436013211'; $VAR26 = '0.0247940148552579'; $VAR27 = '8.44962130265694e-07'; $VAR28 = '0.0386907418899151'; $VAR29 = '0.0068774786367488';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.695593924006146'; $VAR2 = '0.00277372544789022'; $VAR3 = '0.301632350545964';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.550781679186873'; $VAR3 = 17; $VAR4 = '0.449218320813127';
Author Z-Score-Based P-Values
$VAR1 = '0.171093312981604'; $VAR2 = '2.93984516386849e-05'; $VAR3 = '0.0261742602428154';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.017468788308475'; $VAR2 = '0.00209618926795158'; $VAR3 = '0.0377896977118809'; $VAR4 = '0.0705380117864'; $VAR5 = '0.028081682210077'; $VAR6 = '2.05457924427615e-22'; $VAR7 = '4.30162075490204e-08'; $VAR8 = '0'; $VAR9 = '2.97553797309504e-223'; $VAR10 = '6.35647516766444e-297'; $VAR11 = '0.00711000175668741'; $VAR12 = '4.67303879057975e-18'; $VAR13 = '0.000507796144920998'; $VAR14 = '0.00583575640902676'; $VAR15 = '0.0138881858631337'; $VAR16 = '0.00858087300103147'; $VAR17 = '0.0709687827662151'; $VAR18 = '8.44337073916072e-09'; $VAR19 = '0.000100978798859006'; $VAR20 = '2.18809826470324e-09'; $VAR21 = '2.35720243367555e-08'; $VAR22 = '0.000233878173353036'; $VAR23 = '0.0108190182652153'; $VAR24 = '1.51837222498367e-05'; $VAR25 = '0.0439886474468797'; $VAR26 = '0.0837639058544788'; $VAR27 = '5.1748136041113e-07'; $VAR28 = '0.0437603399609289'; $VAR29 = '0.00496119553476288';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.867186716188032'; $VAR2 = '0.000149006096692422'; $VAR3 = '0.132664277715275';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.671330063801907'; $VAR3 = 26; $VAR4 = '0.328669936198093';
Author Z-Score-Based P-Values
$VAR1 = '0.0956755771618791'; $VAR2 = '0.0169869739996606'; $VAR3 = '0.0194407547194331';
Good match. Z-Score-Based P-Value > 0.05.

Control Z-Score-Based P-Values
$VAR1 = '0.00661679827446663'; $VAR2 = '0.00305054344659964'; $VAR3 = '0.0647322321509935'; $VAR4 = '0.0573756525099946'; $VAR5 = '0.00139717503446245'; $VAR6 = '1.79965004338776e-16'; $VAR7 = '2.02142199733767e-08'; $VAR8 = '8.80635785207679e-34'; $VAR9 = '7.79932138211456e-241'; $VAR10 = '2.14537863546996e-36'; $VAR11 = '0.00131156699982828'; $VAR12 = '9.39497203355355e-18'; $VAR13 = '1.65744040944569e-05'; $VAR14 = '0.00198489970207878'; $VAR15 = '0.0046253799657655'; $VAR16 = '0.00572984984393743'; $VAR17 = '0.0729294208195738'; $VAR18 = '3.11592567709835e-05'; $VAR19 = '0.000538960425939253'; $VAR20 = '9.49186286500501e-08'; $VAR21 = '2.20443839505331e-07'; $VAR22 = '0.000132006616553062'; $VAR23 = '0.0394379752156319'; $VAR24 = '0.00398524325338263'; $VAR25 = '0.0204108812699442'; $VAR26 = '0.0655069512584329'; $VAR27 = '5.81521301343538e-06'; $VAR28 = '0.0606773159846999'; $VAR29 = '0.037052977256516';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.724248167173684'; $VAR2 = '0.128588560947643'; $VAR3 = '0.147163271878673';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.567453980055821'; $VAR3 = 17; $VAR4 = '0.432546019944179';
Author Z-Score-Based P-Values
$VAR1 = '0.135078904501818'; $VAR2 = '0.0164330049169535'; $VAR3 = '0.0418255920257736';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.00800234151531736'; $VAR2 = '0.00229489103784224'; $VAR3 = '0.0596047410602183'; $VAR4 = '0.126687136797961'; $VAR5 = '0.00895146756790502'; $VAR6 = '2.62639498177396e-11'; $VAR7 = '1.03631923299016e-10'; $VAR8 = '5.56488435894115e-15'; $VAR9 = '1.81248682539519e-180'; $VAR10 = '1.24069076178567e-81'; $VAR11 = '0.0038523185446857'; $VAR12 = '8.16586345758538e-17'; $VAR13 = '0.000119692616273651'; $VAR14 = '0.00376255793424717'; $VAR15 = '0.0110163045579493'; $VAR16 = '0.00861569118703454'; $VAR17 = '0.0675229482104783'; $VAR18 = '2.42135667078791e-05'; $VAR19 = '0.00010384419987666'; $VAR20 = '4.59554089396257e-08'; $VAR21 = '5.20662392236287e-07'; $VAR22 = '0.000222381780764746'; $VAR23 = '0.0364552955176135'; $VAR24 = '1.75579565835158e-05'; $VAR25 = '0.0237539861646592'; $VAR26 = '0.069422689213621'; $VAR27 = '7.89070448310917e-05'; $VAR28 = '0.0440751072171114'; $VAR29 = '0.018573053969703';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.698668926062244'; $VAR2 = '0.0849964688390627'; $VAR3 = '0.216334605098693';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.516029137435453'; $VAR3 = 4; $VAR4 = '0.483970862564547';
Author Z-Score-Based P-Values
$VAR1 = '0.189525919025166'; $VAR2 = '0.0168636137230183'; $VAR3 = '0.0247732041793049';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.00474757719758126'; $VAR2 = '0.00214473585815616'; $VAR3 = '0.0601935188822396'; $VAR4 = '0.115714909628259'; $VAR5 = '0.000621548139443569'; $VAR6 = '5.05310788978207e-13'; $VAR7 = '4.26046675720424e-07'; $VAR8 = '4.26386380629512e-12'; $VAR9 = '9.21222428850802e-173'; $VAR10 = '1.80161020526088e-29'; $VAR11 = '0.00270512872842512'; $VAR12 = '4.72301914415365e-17'; $VAR13 = '0.000499656106281198'; $VAR14 = '0.00392411287588487'; $VAR15 = '0.0140356089586303'; $VAR16 = '0.0061817613252526'; $VAR17 = '0.0782610357081385'; $VAR18 = '3.49835106627283e-07'; $VAR19 = '0.00118816821926077'; $VAR20 = '7.20131135072874e-08'; $VAR21 = '2.13963017923509e-07'; $VAR22 = '1.14334665597004e-05'; $VAR23 = '0.0276372230801344'; $VAR24 = '0.00413310846205375'; $VAR25 = '0.0622713594047995'; $VAR26 = '0.143895610573104'; $VAR27 = '4.39497771153755e-07'; $VAR28 = '0.0758625428199919'; $VAR29 = '0.0155745269463051';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.819880926935971'; $VAR2 = '0.0729512634569128'; $VAR3 = '0.107167809607116';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.568427357566022'; $VAR3 = 26; $VAR4 = '0.431572642433978';
Author Z-Score-Based P-Values
$VAR1 = '0.127635123348666'; $VAR2 = '0.000161005483443711'; $VAR3 = '0.0325762869441846';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0249236335642817'; $VAR2 = '0.000195949126252223'; $VAR3 = '0.0397640629310382'; $VAR4 = '0.0442602718403091'; $VAR5 = '0.021287838870501'; $VAR6 = '1.24345470755688e-35'; $VAR7 = '1.30142246256675e-07'; $VAR8 = '0'; $VAR9 = '1.08897174676258e-232'; $VAR10 = '9.4638749992208e-91'; $VAR11 = '0.00333314545480604'; $VAR12 = '4.45327558615917e-25'; $VAR13 = '0.00328320445284432'; $VAR14 = '0.00185571161600504'; $VAR15 = '0.0138703432634595'; $VAR16 = '0.00555228719234714'; $VAR17 = '0.0357386810494259'; $VAR18 = '2.36037109634694e-06'; $VAR19 = '0.000172656833885411'; $VAR20 = '2.41367325438706e-10'; $VAR21 = '2.80235898635616e-09'; $VAR22 = '9.4003667791662e-06'; $VAR23 = '0.0347238835919638'; $VAR24 = '2.25716876895229e-11'; $VAR25 = '0.0110761997067075'; $VAR26 = '0.0401505144431028'; $VAR27 = '7.54931737203513e-10'; $VAR28 = '0.0134465916862291'; $VAR29 = '0.00658066002496649';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.795867061868706'; $VAR2 = '0.00100394748476135'; $VAR3 = '0.203128990646532';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.742516244884564'; $VAR3 = 4; $VAR4 = '0.257483755115436';
Author Z-Score-Based P-Values
$VAR1 = '0.13738137999186'; $VAR2 = '0.00530013818383036'; $VAR3 = '0.0437171656181973';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0149616848741633'; $VAR2 = '0.00129207669596575'; $VAR3 = '0.0655531990569553'; $VAR4 = '0.0780435496146124'; $VAR5 = '0.00398209907694098'; $VAR6 = '1.81544686730898e-08'; $VAR7 = '2.63510503015552e-06'; $VAR8 = '4.46067998569299e-14'; $VAR9 = '2.70382395087962e-194'; $VAR10 = '1.55025254228169e-28'; $VAR11 = '0.000901275889851466'; $VAR12 = '4.25800570432781e-19'; $VAR13 = '0.000395758026310977'; $VAR14 = '0.00205049563824668'; $VAR15 = '0.00786214433048355'; $VAR16 = '0.00764262797208844'; $VAR17 = '0.0271316237041956'; $VAR18 = '2.7102017150026e-06'; $VAR19 = '0.000797181870815189'; $VAR20 = '4.47802000788539e-06'; $VAR21 = '5.30174657255771e-07'; $VAR22 = '0.000135749404947173'; $VAR23 = '0.0140624866737551'; $VAR24 = '0.00382965516715977'; $VAR25 = '0.0253111330350242'; $VAR26 = '0.0970014317859946'; $VAR27 = '1.3192312142553e-05'; $VAR28 = '0.0387486871259045'; $VAR29 = '0.0418227102338343';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.737029775080231'; $VAR2 = '0.0284344185052886'; $VAR3 = '0.234535806414481';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.58614101840398'; $VAR3 = 26; $VAR4 = '0.41385898159602';
Author Z-Score-Based P-Values
$VAR1 = '0.136370802057711'; $VAR2 = '0.00159857833163258'; $VAR3 = '0.0172956949153583';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.00568610916986102'; $VAR2 = '0.00456936695187693'; $VAR3 = '0.0636686809739386'; $VAR4 = '0.0680755406253694'; $VAR5 = '0.00514832785670642'; $VAR6 = '4.60793415446126e-11'; $VAR7 = '6.91315501037634e-08'; $VAR8 = '2.14298109929115e-08'; $VAR9 = '1.60155047762868e-249'; $VAR10 = '3.08146853487366e-29'; $VAR11 = '0.0082120240306622'; $VAR12 = '2.50211998533869e-16'; $VAR13 = '0.000236921602727938'; $VAR14 = '0.00826720127873251'; $VAR15 = '0.010133424632304'; $VAR16 = '0.0189382765303874'; $VAR17 = '0.0526026982256684'; $VAR18 = '2.77483432444593e-10'; $VAR19 = '0.000226355763090211'; $VAR20 = '1.16122599317272e-06'; $VAR21 = '9.24903611121384e-07'; $VAR22 = '0.000301723330503152'; $VAR23 = '0.0160206628835168'; $VAR24 = '0.00164787006584991'; $VAR25 = '0.047720212632697'; $VAR26 = '0.0600367493074169'; $VAR27 = '4.18864905077825e-06'; $VAR28 = '0.0329770370671062'; $VAR29 = '0.00739714314530839';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.878309573418803'; $VAR2 = '0.0102958010904605'; $VAR3 = '0.111394625490737';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.667024903786634'; $VAR3 = 4; $VAR4 = '0.332975096213366';
Author Z-Score-Based P-Values
$VAR1 = '0.14278106513849'; $VAR2 = '0.00997493711260808'; $VAR3 = '0.0761397408138361';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0211599582692338'; $VAR2 = '0.00363064656281264'; $VAR3 = '0.0367122613590212'; $VAR4 = '0.0451035143944168'; $VAR5 = '0.00407073643440096'; $VAR6 = '8.13582348302604e-12'; $VAR7 = '1.30876062242339e-08'; $VAR8 = '2.93545026254554e-17'; $VAR9 = '1.55120811858387e-249'; $VAR10 = '9.76528916397391e-236'; $VAR11 = '0.00022354346101783'; $VAR12 = '1.01254663831992e-20'; $VAR13 = '0.000970427679693843'; $VAR14 = '0.0193643941913745'; $VAR15 = '0.011113006859313'; $VAR16 = '0.0304550530958849'; $VAR17 = '0.0926016269798557'; $VAR18 = '3.63123856023538e-06'; $VAR19 = '0.00225693366535087'; $VAR20 = '0.000788555573142484'; $VAR21 = '0.000167302481797237'; $VAR22 = '0.00659841150935002'; $VAR23 = '0.0353272912587754'; $VAR24 = '0.00450306594057594'; $VAR25 = '0.0210389680189147'; $VAR26 = '0.0492579294174438'; $VAR27 = '0.0111052701144541'; $VAR28 = '0.0355933040971247'; $VAR29 = '0.0597516361165316';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.623782090600022'; $VAR2 = '0.0435785173592255'; $VAR3 = '0.332639392040753';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.606591180742816'; $VAR3 = 17; $VAR4 = '0.393408819257184';
Author Z-Score-Based P-Values
$VAR1 = '0.120471364806243'; $VAR2 = '0.0162388685617554'; $VAR3 = '0.0459337033815459';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0195776421949735'; $VAR2 = '0.00508212995090966'; $VAR3 = '0.0452433495816131'; $VAR4 = '0.0419806074102327'; $VAR5 = '0.00197943639376425'; $VAR6 = '1.31162890177067e-11'; $VAR7 = '5.15478494565572e-08'; $VAR8 = '1.64427061191786e-12'; $VAR9 = '1.99419046393481e-278'; $VAR10 = '9.09233604201113e-235'; $VAR11 = '6.35596380937309e-06'; $VAR12 = '2.36584286030872e-20'; $VAR13 = '2.17037662681308e-05'; $VAR14 = '0.00374974857661464'; $VAR15 = '0.00569289008431'; $VAR16 = '0.0239483837880326'; $VAR17 = '0.046516541569381'; $VAR18 = '8.76618186095479e-05'; $VAR19 = '0.00290968599584928'; $VAR20 = '0.00122890914156351'; $VAR21 = '0.000310871408621188'; $VAR22 = '0.0118792329789819'; $VAR23 = '0.0181705324670237'; $VAR24 = '0.000836237854485852'; $VAR25 = '0.014470316005342'; $VAR26 = '0.0521271403191852'; $VAR27 = '0.0103171454274442'; $VAR28 = '0.0311503162895643'; $VAR29 = '0.0189896369112289';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.659596847014105'; $VAR2 = '0.0889099788952942'; $VAR3 = '0.251493174090601';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.697986142572301'; $VAR3 = 26; $VAR4 = '0.3020138574277';
Author Z-Score-Based P-Values
$VAR1 = '0.0681560263593454'; $VAR2 = '0.00567815620741305'; $VAR3 = '0.0818869497635298';
Good match. Z-Score-Based P-Value > 0.05.

Control Z-Score-Based P-Values
$VAR1 = '0.0369453019167157'; $VAR2 = '0.00667417909576646'; $VAR3 = '0.0408707884218824'; $VAR4 = '0.0388687361545007'; $VAR5 = '0.0061412858488877'; $VAR6 = '1.81249692477716e-13'; $VAR7 = '8.88366382355043e-13'; $VAR8 = '1.45731151767119e-23'; $VAR9 = '0'; $VAR10 = '6.43779705196076e-74'; $VAR11 = '0.00264709903943586'; $VAR12 = '4.73609442383191e-20'; $VAR13 = '0.00204116499497372'; $VAR14 = '0.0220899232232348'; $VAR15 = '0.0206425169940387'; $VAR16 = '0.0370990687208815'; $VAR17 = '0.0657884676089261'; $VAR18 = '0.0236326485453446'; $VAR19 = '0.0129272807154352'; $VAR20 = '0.00300271962117876'; $VAR21 = '0.0021221685121358'; $VAR22 = '0.0195135696971412'; $VAR23 = '0.0809185232532458'; $VAR24 = '8.01610225955405e-13'; $VAR25 = '0.00069909167914679'; $VAR26 = '0.012529095139316'; $VAR27 = '0.0132373345674838'; $VAR28 = '0.00854334996233293'; $VAR29 = '0.0115132573365989';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.437680007455795'; $VAR2 = '0.036463620078035'; $VAR3 = '0.52585637246617';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 3; $VAR2 = '0.502974182907795'; $VAR3 = 23; $VAR4 = '0.497025817092205';
Author Z-Score-Based P-Values
$VAR1 = '0.122360514871706'; $VAR2 = '0.0020295026025486'; $VAR3 = '0.0423668554353428';
Excellent match. Z-Score-Based P-Value > 0.10.

Control Z-Score-Based P-Values
$VAR1 = '0.0144049783845335'; $VAR2 = '0.0145038631784281'; $VAR3 = '0.0687224792012786'; $VAR4 = '0.108417694522221'; $VAR5 = '0.0542926979914485'; $VAR6 = '4.54873449688004e-11'; $VAR7 = '9.15805274686801e-08'; $VAR8 = '0'; $VAR9 = '4.69116018964686e-263'; $VAR10 = '2.33332083972019e-77'; $VAR11 = '0.0246851172389869'; $VAR12 = '3.62961477959803e-14'; $VAR13 = '0.000245801839435087'; $VAR14 = '0.0101187054318321'; $VAR15 = '0.0147920493197497'; $VAR16 = '0.0158339553964157'; $VAR17 = '0.0419279804119123'; $VAR18 = '0.000280579993972628'; $VAR19 = '0.00338974876798934'; $VAR20 = '5.49632491709367e-07'; $VAR21 = '2.00132515228599e-05'; $VAR22 = '0.000478244026570665'; $VAR23 = '0.0275792780796866'; $VAR24 = '9.65289021430619e-06'; $VAR25 = '0.0060473892411165'; $VAR26 = '0.0190793313089856'; $VAR27 = '3.18806228752215e-05'; $VAR28 = '0.0212429758836958'; $VAR29 = '0.0206027557417192';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.733765947614288'; $VAR2 = '0.0121704285234998'; $VAR3 = '0.254063623862212';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.53020826876615'; $VAR3 = 4; $VAR4 = '0.46979173123385';
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Peter Kirby
Site Admin
Posts: 8020
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Peter Kirby »

And, just for fun, I separated out the quotes from the Stromata (marked by the editor with quotation marks, anyway) and compared them to Clement.

It can't be verified as the writing of Clement of Alexandria. Well, that's good to know!
testsize: 2663
$VAR1 = 59; $VAR2 = 29; $VAR3 = 64; $VAR4 = 147; $VAR5 = 8; $VAR6 = 44; $VAR7 = 9; $VAR8 = 19; $VAR9 = 22; $VAR10 = 30; $VAR11 = 48; $VAR12 = 26; $VAR13 = 16; $VAR14 = 5; $VAR15 = 10; $VAR16 = 10; $VAR17 = 9; $VAR18 = 6; $VAR19 = 10; $VAR20 = 6; $VAR21 = 18; $VAR22 = 20;

22 Words
$VAR1 = [ 'AUTOS', 'AUTOU', 'AUTWi', 'AUTON', 'AUTOI', 'AUTWN', 'AUTOIS', 'AUTOUS', 'AUTH', 'AUTHS', 'AUTHi', 'AUTHN', 'AUTAI', 'AUTWN', 'AUTAIS', 'AUTAS', 'AUTO', 'AUTA' ]; $VAR2 = [ 'TIS', 'TINOS', 'TINI', 'TINA', 'TINES', 'TINWN', 'TISI', 'TISIN', 'TINAS', 'TI', 'TINA' ]; $VAR3 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ]; $VAR4 = [ 'KAI' ]; $VAR5 = [ 'TE' ]; $VAR6 = [ 'DE', 'D' ]; $VAR7 = [ 'MEN' ]; $VAR8 = [ 'ALLA', 'ALL' ]; $VAR9 = [ 'GAR' ]; $VAR10 = [ 'EIS' ]; $VAR11 = [ 'EN' ]; $VAR12 = [ 'EK', 'EC' ]; $VAR13 = [ 'KATA', 'KAT', 'KAQ' ]; $VAR14 = [ 'PROS' ]; $VAR15 = [ 'OUN' ]; $VAR16 = [ 'INA' ]; $VAR17 = [ 'OTI' ]; $VAR18 = [ 'APO', 'AP' ]; $VAR19 = [ 'PERI' ]; $VAR20 = [ 'POLUS', 'POLLOU', 'POLLWi', 'POLUN', 'POLLH', 'POLLHS', 'POLLHi', 'POLLHN', 'POLU', 'POLLOU', 'POLLWi', 'POLU', 'POLLOI', 'POLLWN', 'POLLOIS', 'POLLOUS', 'POLLAI', 'POLLWN', 'POLLAIS', 'POLLAS', 'POLLA', 'POLLWN', 'POLLOIS', 'POLLA' ]; $VAR21 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR22 = [ 'EPI', 'EP' ];

Author Z-Score-Based P-Values
$VAR1 = '0.0851802438581617'; $VAR2 = '0.0724749834074447'; $VAR3 = '0.064286883957332';
Good match. Z-Score-Based P-Value > 0.05.

Control Z-Score-Based P-Values
$VAR1 = '0.00190437533607135'; $VAR2 = '0.0134963811649367'; $VAR3 = '0.141273334998977'; $VAR4 = '0.124951326899165'; $VAR5 = '0.0518220257168887'; $VAR6 = '2.16760803715612e-06'; $VAR7 = '0.000296736155878467'; $VAR8 = '6.929923557776e-12'; $VAR9 = '3.79213525315146e-67'; $VAR10 = '2.2577358826784e-16'; $VAR11 = '0.00273097840803051'; $VAR12 = '7.42388525889789e-09'; $VAR13 = '0.00435259762804016'; $VAR14 = '0.00143267823333169'; $VAR15 = '0.00337363803985972'; $VAR16 = '0.00828495704831723'; $VAR17 = '0.010618325608902'; $VAR18 = '5.98973976614139e-07'; $VAR19 = '0.000594576185258456'; $VAR20 = '4.96432404140559e-09'; $VAR21 = '7.18452132102276e-10'; $VAR22 = '3.01670111804679e-05'; $VAR23 = '0.0244209185087774'; $VAR24 = '6.0996662018508e-05'; $VAR25 = '0.0179900750931756'; $VAR26 = '0.14545087264223'; $VAR27 = '2.10064907187007e-08'; $VAR28 = '0.119497354155087'; $VAR29 = '0.00551171200172479';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.383794870602989'; $VAR2 = '0.32654904023439'; $VAR3 = '0.289656089162622';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.369335435524447'; $VAR3 = 26; $VAR4 = '0.630664564475553';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.0851 Test, Z-Score-Based Method
1
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.0851 Test, Z-Score-Based Method
0.3359375
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
0.748538011695906

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.0851 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
1

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.0851 Test, Z-Score-Based Method
0.833333333333333
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
0.545454545454545
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Peter Kirby
Site Admin
Posts: 8020
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Peter Kirby »

Peter Kirby wrote: If using a 'z-scored p-value < 0.1' test, only 10 of 32 candidates are excluded.

If using a 'z-scored p-value < 0.05' test, only 3 of 32 candidates are excluded.
This seemed to be a handy metric of the validity of a result, so it's been baked into the printed results.

For example, here are the results for the quotes from the Stromata again.
testsize: 2663
$VAR1 = 59; $VAR2 = 29; $VAR3 = 64; $VAR4 = 147; $VAR5 = 8; $VAR6 = 44; $VAR7 = 9; $VAR8 = 19; $VAR9 = 22; $VAR10 = 30; $VAR11 = 48; $VAR12 = 26; $VAR13 = 16; $VAR14 = 5; $VAR15 = 10; $VAR16 = 10; $VAR17 = 9; $VAR18 = 6; $VAR19 = 10; $VAR20 = 6; $VAR21 = 18; $VAR22 = 20;

22 Words
$VAR1 = [ 'AUTOS', 'AUTOU', 'AUTWi', 'AUTON', 'AUTOI', 'AUTWN', 'AUTOIS', 'AUTOUS', 'AUTH', 'AUTHS', 'AUTHi', 'AUTHN', 'AUTAI', 'AUTWN', 'AUTAIS', 'AUTAS', 'AUTO', 'AUTA' ]; $VAR2 = [ 'TIS', 'TINOS', 'TINI', 'TINA', 'TINES', 'TINWN', 'TISI', 'TISIN', 'TINAS', 'TI', 'TINA' ]; $VAR3 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ]; $VAR4 = [ 'KAI' ]; $VAR5 = [ 'TE' ]; $VAR6 = [ 'DE', 'D' ]; $VAR7 = [ 'MEN' ]; $VAR8 = [ 'ALLA', 'ALL' ]; $VAR9 = [ 'GAR' ]; $VAR10 = [ 'EIS' ]; $VAR11 = [ 'EN' ]; $VAR12 = [ 'EK', 'EC' ]; $VAR13 = [ 'KATA', 'KAT', 'KAQ' ]; $VAR14 = [ 'PROS' ]; $VAR15 = [ 'OUN' ]; $VAR16 = [ 'INA' ]; $VAR17 = [ 'OTI' ]; $VAR18 = [ 'APO', 'AP' ]; $VAR19 = [ 'PERI' ]; $VAR20 = [ 'POLUS', 'POLLOU', 'POLLWi', 'POLUN', 'POLLH', 'POLLHS', 'POLLHi', 'POLLHN', 'POLU', 'POLLOU', 'POLLWi', 'POLU', 'POLLOI', 'POLLWN', 'POLLOIS', 'POLLOUS', 'POLLAI', 'POLLWN', 'POLLAIS', 'POLLAS', 'POLLA', 'POLLWN', 'POLLOIS', 'POLLA' ]; $VAR21 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR22 = [ 'EPI', 'EP' ];

Author Z-Score-Based P-Values
$VAR1 = '0.0850371937529633'; $VAR2 = '0.0725428286743953'; $VAR3 = '0.0645305982464296';
Decent compatibility. Z-Score-Based P-Value > 0.05.
Poor indicator. 22.5% others with P-Value > 0.05.


Control Z-Score-Based P-Values
$VAR1 = '0.00188696923841321'; $VAR2 = '0.0134054452934796'; $VAR3 = '0.14121502037573'; $VAR4 = '0.125519981945472'; $VAR5 = '0.0521983563840763'; $VAR6 = '2.1707630208328e-06'; $VAR7 = '0.000296715186260485'; $VAR8 = '6.93442929575849e-12'; $VAR9 = '1.7299079994794e-67'; $VAR10 = '7.47851133323473e-17'; $VAR11 = '0.002791748346296'; $VAR12 = '9.34882392453667e-09'; $VAR13 = '0.00430737734281189'; $VAR14 = '0.00143248134317908'; $VAR15 = '0.00349081485947745'; $VAR16 = '0.0081767518985651'; $VAR17 = '0.0105903941027185'; $VAR18 = '5.98856413343923e-07'; $VAR19 = '0.000594577047668175'; $VAR20 = '4.96428289372844e-09'; $VAR21 = '7.18444719382476e-10'; $VAR22 = '3.01661467556279e-05'; $VAR23 = '0.0244207670023187'; $VAR24 = '6.09948007155562e-05'; $VAR25 = '0.017990161932696'; $VAR26 = '0.14545117735424'; $VAR27 = '2.10058996891061e-08'; $VAR28 = '0.119497447614848'; $VAR29 = '0.00551173549315115';

Bayesian Author Test: Posterior Probabilities from Equal Priors, Z-Score-Based Method
$VAR1 = '0.382859646670641'; $VAR2 = '0.326606753222028'; $VAR3 = '0.290533600107331';

Bayesian Comparison of Best Author to Best Control: from Equal Priors, Z-Score-Based Method
$VAR1 = 1; $VAR2 = '0.368943532137729'; $VAR3 = 26; $VAR4 = '0.631056467862271';

Percentage of Samples in the Best Author Candidate that Meet the P-Value>0.085 Test, Z-Score-Based Method
1
Percentage of Samples outside the Best Author Candidate that Meet the P-Value>0.085 Test, Z-Score-Based Method
0.33984375
Posterior Probability of a Sample Meeting the Test Being by the Best Author Candidate (with Prior = 0.5), not Any Other, Z-Score-Based Method
0.746355685131195

Percentage of Samples in the Second-Best Author Candidate that Meet the P-Value>0.085 Test, Z-Score-Based Method
0
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Second-Best Author, Z-Score-Based Method
1

Percentage of Samples in the Best Control Candidate that Meet the P-Value>0.085 Test, Z-Score-Based Method
0.833333333333333
Posterior Probability of a Sample Meeting the Test Being by the Best Author, not the Best Control Author, Z-Score-Based Method
0.545454545454545
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Tenorikuma
Posts: 374
Joined: Thu Nov 14, 2013 6:40 am

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Tenorikuma »

I confess to not knowing what most of the numbers mean. But if Clement is the top match out of all author candidates and controls, surely that itself is somewhat remarkable. Either that, or a forger is purposely trying to fool stylometric analysis by copying sentence construction patterns from genuine works by Clement. Or have I misunderstood?
User avatar
Peter Kirby
Site Admin
Posts: 8020
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Peter Kirby »

Tenorikuma wrote:I confess to not knowing what most of the numbers mean. But if Clement is the top match out of all author candidates and controls, surely that itself is somewhat remarkable. Either that, or a forger is purposely trying to fool stylometric analysis by copying sentence construction patterns from genuine works by Clement. Or have I misunderstood?
These were the aggregated P-Values for each author/control, compared against the test sample. They are 'aggregated' in the sense that they come from a weighted average, calculated over all the 20 words whose frequencies are measured. Each of the twenty individual "P-values" (the likelihood of the observed frequency being that far off the mean) go into the weighted average, trying to gauge how likely it is that the author/control would have produced the sample.

Ideally you would like one of the candidate authors to have a very high aggregate, Z-Score-Based P-Value... and, more importantly, you would like it to be far and away greater than the rest. The maximum is 1.

Here they were again (using 22 words and using the entire 'Letter to Theodore'):
Author Chi-Square-Based P-Values
$VAR1 = '0.897126717178744'; $VAR2 = '0.876686010922532'; $VAR3 = '0.766954128799045';

Control Chi-Square-Based P-Values
$VAR1 = '0.137898770544387'; $VAR2 = '0.263214501782374'; $VAR3 = '0.97251600690767'; $VAR4 = '0.996383635782013'; $VAR5 = '0.995727137880758'; $VAR6 = '0.000948092883388932'; $VAR7 = '0.253210360195565'; $VAR8 = '0.00852822453421988'; $VAR9 = '0.65938916375342'; $VAR10 = 0; $VAR11 = '0.346606811740825'; $VAR12 = '1.0011159413525e-12'; $VAR13 = '0.00377297557020349'; $VAR14 = '0.169854249175826'; $VAR15 = '0.599467656898274'; $VAR16 = '0.493172636244664'; $VAR17 = '0.912577608896469'; $VAR18 = '0.277004254454816'; $VAR19 = '0.00103826061119019'; $VAR20 = '0.149481145224522'; $VAR21 = '0.000251868655303691'; $VAR22 = '0.851018068030414'; $VAR23 = '0.697616764950275'; $VAR24 = '0.442629632622023'; $VAR25 = '0.993407920946543'; $VAR26 = '0.98246524429608'; $VAR27 = '0.118313132250566'; $VAR28 = '0.998364840186754'; $VAR29 = '0.8885746060188';
You could put them in two categories if you like (those with scores close to or higher than 'Clement', and those without):

0.8 or higher - 7 out of 29 controls higher than 'Clement' (!), also 2 slightly less and 1 author candidate slightly less

$VAR3 = '0.97251600690767';
$VAR4 = '0.996383635782013';
$VAR5 = '0.995727137880758';
$VAR17 = '0.912577608896469';
$VAR28 = '0.998364840186754';
$VAR26 = '0.98246524429608';
$VAR25 = '0.993407920946543';
$VAR1-author-candidate = '0.897126717178744';
$VAR2-author-candidate = '0.876686010922532';
$VAR22 = '0.851018068030414';
$VAR29 = '0.8885746060188';

Less than 0.8

$VAR1 = '0.137898770544387'; $VAR2 = '0.263214501782374'; $VAR6 = '0.000948092883388932'; $VAR7 = '0.253210360195565'; $VAR8 = '0.00852822453421988'; $VAR9 = '0.65938916375342'; $VAR10 = 0; $VAR11 = '0.346606811740825'; $VAR12 = '1.0011159413525e-12'; $VAR13 = '0.00377297557020349'; $VAR14 = '0.169854249175826'; $VAR15 = '0.599467656898274'; $VAR16 = '0.493172636244664';$VAR18 = '0.277004254454816'; $VAR19 = '0.00103826061119019'; $VAR20 = '0.149481145224522'; $VAR21 = '0.000251868655303691'; $VAR23 = '0.697616764950275'; $VAR24 = '0.442629632622023'; $VAR27 = '0.118313132250566';

With results like these, we must simply throw up our hands and hope we find the rest of the letter one day (or, get better methods).

Way too many authors and controls present as 'very likely' to have produced the sample, when measuring these words and using a text of this length (749 words). That's not really the root of the problem, however; it's more the symptom.

The method is based on the assumption/convention of fitting the distribution of the frequency of occurence of each 'word' to a normal distribution (what is sometimes called the "bell" curve). So a 'mean' and 'standard deviation' are computed for each of the twenty words. Approximately 95% of the same-length samples from a given author should have an observed frequency within two standard deviations of the mean. Approximately 68% within one.

Here are the means and standard deviations for Clement of Alexandria:
$VAR1 = #1 [ [ '10.235632183908', '4.19313826487864' ], #2 [ '5.66091954022988', '3.41312060406515' ], #3 [ '16.2528735632184', '6.95175226534479' ], #4 [ '39.6551724137931', '7.76053616201855' ], #5 [ '5.17816091954023', '3.71040553867919' ], #6 [ '22.1034482758621', '5.65285907268176' ], #7 [ '6.59770114942529', '3.06053774147782' ], #8 [ '4.88505747126437', '2.36801137460534' ], #9 [ '8.49425287356322', '3.25672977227436' ], #10 [ '5.53448275862069', '3.10106869180802' ], #11 [ '6.14942528735632', '3.7463506707016' ], #12 [ '3.13218390804598', '2.56855405689453' ], #13 [ '4.62068965517241', '3.64751964002605' ], #14 [ '3.9367816091954', '2.4521918388198' ], #15 [ '2.94827586206897', '1.8107826509434' ], #16 [ '0.67816091954023', '0.909645034641222' ], #17 [ '1.36206896551724', '1.45465889506471' ], #18 [ '1.08045977011494', '1.37897505503346' ], #19 [ '2.56896551724138', '2.40051768098924' ], #20 [ '1.37931034482759', '1.29757846360185' ], #21 [ '3.26436781609195', '2.41384783736521' ], #22 [ '3.4367816091954', '2.18774617360459' ] ];
Here are the words (at least, this is a 22-word list, used with the above 22-word list of means and standard deviations):
$VAR1 = [ 'AUTOS', 'AUTOU', 'AUTWi', 'AUTON', 'AUTOI', 'AUTWN', 'AUTOIS', 'AUTOUS', 'AUTH', 'AUTHS', 'AUTHi', 'AUTHN', 'AUTAI', 'AUTWN', 'AUTAIS', 'AUTAS', 'AUTO', 'AUTA' ]; $VAR2 = [ 'TIS', 'TINOS', 'TINI', 'TINA', 'TINES', 'TINWN', 'TISI', 'TISIN', 'TINAS', 'TI', 'TINA' ]; $VAR3 = [ 'EIMI', 'EI', 'ESTI', 'ESTIN', 'ESMEN', 'ESTE', 'EISI', 'EISIN', 'HN', 'HSQA', 'HN', 'HMEN', 'HTE', 'HSAN', 'ESOMAI', 'ESHi', 'ESEI', 'ESTAI', 'ESOMEQA', 'ESESQE', 'ESONTAI', 'W', 'HiS', 'Hi', 'WMEN', 'HTE', 'WSI', 'EIHN', 'EIHS', 'EIH', 'EIHMEN', 'EIMEN', 'EIHTE', 'EITE', 'EIHSAN', 'EIEN', 'ESOIMHN', 'ESOIO', 'ESOITO', 'ESOIMEQA', 'ESOISQE', 'ESOINTO', 'ISQI', 'ESTW', 'ESTE', 'ESTWN', 'ONTWN', 'ESTWSAN', 'EINAI', 'ESESQAI', 'WN', 'OUSA', 'ON', 'ESOMENOS', 'ESOMENH', 'ESOMENON' ]; $VAR4 = [ 'KAI' ]; $VAR5 = [ 'TE' ]; $VAR6 = [ 'DE', 'D' ]; $VAR7 = [ 'MEN' ]; $VAR8 = [ 'ALLA', 'ALL' ]; $VAR9 = [ 'GAR' ]; $VAR10 = [ 'EIS' ]; $VAR11 = [ 'EN' ]; $VAR12 = [ 'EK', 'EC' ]; $VAR13 = [ 'KATA', 'KAT', 'KAQ' ]; $VAR14 = [ 'PROS' ]; $VAR15 = [ 'OUN' ]; $VAR16 = [ 'INA' ]; $VAR17 = [ 'OTI' ]; $VAR18 = [ 'APO', 'AP' ]; $VAR19 = [ 'PERI' ]; $VAR20 = [ 'POLUS', 'POLLOU', 'POLLWi', 'POLUN', 'POLLH', 'POLLHS', 'POLLHi', 'POLLHN', 'POLU', 'POLLOU', 'POLLWi', 'POLU', 'POLLOI', 'POLLWN', 'POLLOIS', 'POLLOUS', 'POLLAI', 'POLLWN', 'POLLAIS', 'POLLAS', 'POLLA', 'POLLWN', 'POLLOIS', 'POLLA' ]; $VAR21 = [ 'PAS', 'PANTOS', 'PANTI', 'PANTA', 'PAS', 'PASA', 'PASHS', 'PASHi', 'PASAN', 'PASA', 'PAN', 'PANTOS', 'PANTI', 'PAN', 'PANTES', 'PANTWN', 'PASI', 'PASIN', 'PANTAS', 'PANTES', 'PASAI', 'PASWN', 'PASAIS', 'PASAS', 'PASAI', 'PANTA', 'PANTWN' ]; $VAR22 = [ 'EPI', 'EP' ];
And here were the observed frequencies for the test sample:
testsize: 749
$VAR1 = 14; $VAR2 = 3; $VAR3 = 15; $VAR4 = 45; $VAR5 = 1; $VAR6 = 15; $VAR7 = 3; $VAR8 = 4; $VAR9 = 9; $VAR10 = 0; $VAR11 = 5; $VAR12 = 7; $VAR13 = 7; $VAR14 = 0; $VAR15 = 3; $VAR16 = 1; $VAR17 = 0; $VAR18 = 2; $VAR19 = 3; $VAR20 = 1; $VAR21 = 4; $VAR22 = 2;
Right off the bat, my program will discard 10 of these words entirely when calculating a Z-score for Clement, because they have a mean number of occurences below 4... this is just a basic prevention against counting data when there is nothing really expected to be there to count.

Just eyeballing it, though, it looks like only five of the words being counted have a high enough frequency that 0.0-0.5 isn't within two standard deviations of the mean. And that basically means that most of them aren't very useful.

And the root of the root of the problem (not that you really care) is.... the Poisson distribution. The frequency of a given word in a sample from an author of length N (randomly occurring with some chance, call it X) is actually better represented by a Poisson distribution. The Poisson distribution looks more like a normal distribution with sufficiently high lengths and/or sufficiently high probabilities of occurence for any particular word. So the normal distribution can approximate the Poisson distribution with higher expected averages for the observed frequency of words in samples.

But, if the expected number of sightings is very low, then the Poisson distribution just looks like a lumpy mess shoved up against the left side of the number line, and it's practically impossible to tell whether a sample with a certain actual observed frequency (like, say, ... 0, 1, 2 or 3) came from any particular Poisson distribution... and thus impossible to say who wrote the thing with that particular observed frequency of the word 'peri', for example.

Wikipedia has a chart:

Image

The small, useless, lumpy, scrunched-up-against-the-left Poisson distribution is shown in orange... and with a small sample, most 'words' being considered as candidates for distinguishing authorship have distributions that look roughly like that.... for every single author.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Peter Kirby
Site Admin
Posts: 8020
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Clement of Alexandria -- A Basic Stylometric Study

Post by Peter Kirby »

For a cursory explanation of all the numbers printed by the program, refer to the previous thread:

http://www.earlywritings.com/forum/view ... f=3&t=1567
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
Post Reply