Electronic document authenticity verification of diploma and transcript using smart contract on Ethereum blockchain

Article history: Received 1 June 2020 Revised 30 June 2020 Accepted 19 July 2020 Available online 3 May 2021 Ethereum is one of the oldest examples of blockchain technology provides a system that converts centralized storage to distributed and records transactions by way of decentralized and not by a centralized system and can be verified by each node, therefore it is suitable for storing fingerprints from official diploma documents and transcripts that are published. Smart contract is needed for making contract transactions to Ethereum with programming code, so contracts such as diplomas and transcripts uploaded on the Ethereum blockchain can distribute and produce diploma validation and the authenticity of transcripts with transaction hash, consensus, and comply with ERC-721 token standardization. The results showed that a sample of 5 electronic documents in pdf format with a transaction speed of 1 second on each file that were published and secured with Ethereum blockchain technology can be easily verified for authenticity, the system proposed and developed by us takes in consideration invalid and failure cases by giving the necessary feedback to the user.


Introduction
Diplomas and transcripts are certificates issued formally by educational institutions to students who have finished their study. Whether it's on elementary school level, junior high school, senior high school, and university level. Nowadays, many diplomas and transcripts are issued by educational institutions per year, but the problem with fake diplomas and transcripts is still a major problem. As stated by Kanan et.al. that fake diplomas and transcripts are serious problems, so efforts must be made to prevent forgery of this certificate at any time [1].
Technological advances that increasingly high efforts to protect data security need to be increased [2], for that technology such as the blockchain that recently has been discussed can be proposed on the security of electronic documents of diplomas and transcripts. Blockchain proposes data security by decentralized storage, having interrelated cryptographic hash functions and consensus [3,4,5,6,7]. The type of technology used in our study is public blockchain Ethereum, because the public blockchain provides the concept that all nodes can participate to mine and implement real decentralized technology [4].
The Head of HKLI Directorate General of Belmawa Nuril Furkan, discussed in 2019 the increasing news about the increasing of fake diplomas and transcripts, this fake diploma reporting took place during the 2019 general election, for this reason, the role of universities is expected to comply with the rule of law that has been stipulated in the Kementrian Riset dan Teknologi Republik Indonesia (Menristekdikti) No. 59 of 2018 [8]. Some companies in Indonesia, require prospective employees when they register for the job must send a copy of the diploma in the form of pdf files, this is vulnerable to the falsification of electronic documents and must be prevented and see its authenticity with file upload based verification system.
Like a diploma in Indonesia in the form of the paper with affixing holograms and logo stamps does not reduce the cases of diplomas and fake transcripts, the true documents in the form of paper are vulnerable to forgery. Director-General of Belmawa in 2017 launched a website-based system for online diploma verification facilities. The name of the system is "Sistem Verifikasi Ijazah Secara Elektronik" (SIVIL). But another problem is a diploma verification system electronically stored on the centralized system. One of of centralized storage is that the database is vulnerable to hacked, because the information is stored on a centralized system [3].
Research conducted by Kanan et.al. focusing on the authentication system for the authenticity of diplomas using blockchain, the application was implemented at Al-Zaytoonah University Jordan, but system of verifying the authenticity of diplomas using student National ID [1]. Further research was conducted by Cheng et.al. focusing on diploma verification system in Taiwan, to solve the problem of fake diplomas in the country they propose a prototype system for verifying blockchain-based diplomas, but the verification mechanism on their system is by inputting a certificate search code on the system [2].
A research conducted by Kumar et.al. focusing on authentication of education certificate documents from the authorities, to carry out the process they use the blockchain technology mechanism, provides cryptographic and distributed concepts, the proposed system is to add a QR code and an inquiry string code to the paper-based certificate, then how to authenticate document using a phone scanner or website with certificate serial number [7].
A research conducted by N. Kumavat et.al. focusing on the problem of fake academic certificates and in the process of validating the authenticity of certificates often have to incur costs and a complex, they propose by storing certificates on the blockchain, but the verification system that they propose is to use a transaction id [9].
A research have reviewed digital signature method and SHA-1 algorithm for digital document security on legalizing undergraduate diplomas, with securing and validating digital document objects designed to be applied to diploma validation mechanisms, according to them legalization is based on having good security [10]. This research applies a website based system, but their system does not use blockchain technology. In addition, the process of verifying the authenticity of diplomas is by inputting digital signatures that printed on paper certificates and input on the website.
Research can verify the authenticity of a digital document, but the system does not use blockchain technology and in paper verification mechanism using digital signatures that printed on paper certificates and input on the website, research [1,2,7,9] on the verification process using the student National ID, certificate search code, certificate serial number, and transaction id. Other than that Sistem Verifikasi Ijazah Secara Elektronik (SIVIL) in Indonesia does not use blockchain technology. Therefore, In this paper our purpose of this system is to verify the authenticity of diplomas and transcripts with uploaded Portable Document Format (PDF) type files, the method proposed by this system is by Ethereum blockchain as a place to store fingerprints data from diplomas and transcripts, smart contract for data validation based on consensus and for making contract transactions to Ethereum with programming code, and SHA-256 algorithm to get fingerprint from diploma files and transcripts. Our system consists of DApp for making diploma and verification systems and file upload-based transcripts, DApp is a decentralized application can run on the Ethereum blockchain system and that uses peer-to-peer networks [3,11].

Related Work
SmartCert blockchain imperative for educational certificates [1] presents, building an authentication system for the authenticity of diplomas using a blockchain, this application was carried out at Al-Zaytoonah University Jordan, the advantages of this research are building a system using a blockchain and applied directly to universities, and system of verifying the authenticity of diplomas using student National ID.
Blockchain and smart contract for digital certificate [2] presents, making a diploma verification system in Taiwan, to solve the problem of fake diplomas in the country they propose a prototype system for verifying blockchain-based diplomas, and the verification mechanism on their system is by inputting a certificate search code on the system. Educational certificate verification system using blockchain [7] presents, the process of verifying the authenticity of diplomas is one of the things that is routinely done by job providers, job providers require a lot of time to provide results from interviews, in this case a certificate of authenticity authentication process is needed, generally companies do authentication for a long time, to overcome this problem they create a blockchain-based diploma verification system, because the blockchain provides cryptographic and distributed functions, their verification system uses a phone scanner or website with a certificate serial number.
Certificate verification system using blockchain [9] presents, academic certificates issued from tertiary institutions still use hard copies to be given to students, problems that exist when the validation process of the authenticity of certificates often takes a long time, and there is potential for fake certificates, many cases of fake diplomas, problems they solve is reducing fake certificates with blockchain technology, blockchain can store certificate data, and the verification system that they propose is by transaction id.
Digital document security on legalize higher education diplomas with digital signature and SHA-1 algorithm [10] presents, security of digital diploma documents with digital signature method and SHA-1 algorithm, the digital signature that has been created is then placed on the certificate, and to test the validation of the diploma documents using the digital signature that is on the certificate.
Developing Ethereum blockchain-based document verification smart contract for moodle learning 1 management system [12] presents, in the digital verification system in the world of education in Turkey using a blockchain and smart contract-based system that is connected to the moodle learning system model, this system uses the public blockchain on the ropsten network.
Physical document validation with perceptual hash [13] presents, the process of electronic validation on the need for physical documents that are carried out electronically, a problem which shows that physical documents will have different hash values every digitized, the results get to show that validation with a hash can detect that the electronic file has been changed, as well as to detect the original file, but the system built does not use blockchain technology.
Implementation of RSA 2048-bit and AES 256-bit with digital signature for secure electronic health record application [14] presents, the application of encryption and digital signatures for different cases such as health records, they proposed the 2048-bit RSA algorithm, AES 256-bit and SHA 256 in encryption and obtaining digital signatures. The application design developed is to provide integrity, confidentiality, and authentication services, their research uses the black box and white box testing in input testing.
Blockchain and smart contract for digital document verification [15] presents, student graduation occurs annually from different universities, all students who have graduated will have degrees and diplomas, diplomas can be used to apply for jobs or continue education due to the lack of mechanisms for the number of diploma forgery cases starting to occur, the proposed system for overcoming the problem is with blockchain technology, all personal IDs will be entered on the blockchain, e-cert will be created and obtain serial numbers and e-certificates entered into the blockchain, a QR code will be generated and given to the user, a verification mechanism for the authenticity of the diploma with a serial number or QR code.

Blockchain
Blockchain is a decentralized and distributed database that is widely used to record every different transaction in each block are encrypted with the hash of cryptography [1,2,3,16] that uses blockchain is Ethereum [1,2,3], Ethereum is one that runs on the mechanism of blockchain technology and Ethereum presents ideas to avoid dependency on entities to store user data [3].
Blockchain has advantages in terms of securitization [5]. All blocks contained in the blockchain are connected with different or unique hashes and nonce. The node spreads decentralized, then the blocks in the node cannot be changed and can be verified by many parties [2,17]. Blockchain uses many nodes scattered in the network, so that the number of nodes scattered will complicate the attacker in the breach of the system. One of the uniqueness of blockchain technology is the existence of mining which is called a miner if succeeds in solving a mathematical problem it will get a coin [17].

Smart contract
A smart contract is a self-executable, computerized transaction protocol which is useful for facilitating and verifying each contract [18]. Smart contract has a code function that consists of a complete series of Turing operations and makes a contract with code, the code that is run by the network on the blockchain once the contract is called, so each contract is stored in a decentralized database and cannot be changed [19,20].
ERC-721 is an open standard token with the principle that tokens are not exchangeable or unique [21]. ERC-20 is currently the most popular Ethereum standard used to make tokens specifically [21]. ERC-20 can be managed in making tokens process that are operated in various blockchain implementations [4].
Smart contracts are executed independently when validation on a transaction is carried out, to use smart contracts on objects on the blockchain, transactions must be executed to notify that there is a new contract to be entered on the blockchain and the new contract is given a unique address with a 160bit length, and the code is uploaded on the blockchain, after the contract is completed, the smart contract consists of the contract address, the contract balance, the nonce, and the transaction id [18]. Make smart contract using solidity, the solidity is based on contract-oriented programming, which aims to be able to run on Ethereum Virtual Machine (EVM). Just like other programming languages, solidity has data types, variables, and is almost similar to object-oriented programming such as inheritance. The contract function simple including: • Admin can store diploma electronic document to blockchain.
• Admin can store transcript electronic document to blockchain.
• Stakeholder can verify the authenticity of diploma from blockchain.
• Stakeholder can verify the authenticity of transcripts from blockchain. The implementation of ERC-721 tokens, following data types have been created the contract mechanism above: • There is an object to store documents electronically. Each document will be signed using the SHA-256 hashing algorithm and accommodated in an object before being distributed on the blockchain. • A map to make a 32 bytes record of documents. The mapping stage of each array is 32 bytes.  Fig. 1 is an example of a struct data type in the solidity programming language that will be used to store several methods and attributes to support writing smart contract programs.

Method
In this paper, we propose a system for securing and verifying diploma and transcripts electronic documents using blockchain. The reason for using the blockchain is because it is decentralized and guarantees high data integrity [6,22,23]. The type blockchain that we use is the Ethereum blockchain which can make a data storage system decentralized and distributed so it does not make it centralized [11,24,25,26].
The system that we created is based on relevant studies, in the system that we developed we produce diploma document data and transcripts in the form of electronic files of type pdf, where our system will generate hash values from the uniqueness of each document, so the data categories we store are cryptographic hash values and are further enhanced for storage using a blockchain, cannot be falsified, and the attacker cannot change the blockchain decentralized system because every node on the blockchain is spread and must be verified by a consensus algorithm namely Proof-of-Work, so this will be very difficult to hack.
Cryptographic hash values generated from our system consist of 66 digits which each time generated will produce a unique value. For reasons our system uses pdf files to verify the authenticity of diploma documents and transcripts, stakeholders do not need to worry about errors entering hash numbers, because our system accepts pdf files and will generate hashes from these files automatically and see if the value is stored in the blockchain.
One popular hacking method is MITM by an attacker to steal the integrity of the data that exists on the system [27,28,29]. Man-In-The-Middle (MITM) is an attack on the most popular systems in computer security networks MITM presents confidentiality and integrity [27], for this we need high system security in data protection, blockchain technology is used to ensure data integrity [3]. The hash value stored in an array of struct generated from the SHA-256 algorithm, as well as addresses stored in the struct, to secure diploma documents and admin transcripts make the document and produce the document into a pdf type, after that the document is inputted to the system, then when the admin sends the pdf to the blockchain, the system automatically gets the hash value and the smart contract address is saved.

System overview
The general description of the system can be seen in Fig. 2, our DApp system is connected to the Ethereum blockchain, and has a code smart contract for file secure diplomas and transcript and verification.
a. Secure Diplomas and Transcripts: • Make Document Diplomas and Transcript Admin creates an electronic diploma document and transcript, by entering information from each student and making it a pdf type document, in the chancellor and dean section is added with a digital signature namely QR code.

• Get Fingerprint & Address
The system takes the hash from the diploma document and transcript with the SHA-256 algorithm. Secure Hash Algorithm (SHA) is a development of the hash MD function, this algorithm was introduced by the American National Institute as a FIPS standard in 1993 [6,30]. Various types of the SHA algorithm include SHA-0, SHA-224, SHA-256, SHA-384, SHA-512. After the hash value is obtained, then the hash value is entered into an array of struct. The address used is generated from the smart contract when the compilation of the smart contract program occurs in the system and the address is entered into the array of struct. Here we use 3 smart contracts, first the smart contract called "file registration" is used to send the hash of the new file just created to the Ethereum blockchain, the second is the smart contract called "verify file" which is used for the verification process of the electronic document contract diploma and transcript which is in the consensus block on the Ethereum blockchain, and the third "migration file" that is used to create a new address at the address on the Ethereum blockchain.

• Smart Contract Process
At this stage, making smart contracts using a programming language solidity on Remix IDE, contracts that have been made with solidity language programming are then carried out transactions using the smart contract program including hash, address, value ether, gas value, limit gas. Smart contracts are important in the DApp system because this contract is used for basic programming operations in Dapp in our system the contract is used as:

• Transaction Manager
At this stage, the webserver sees whether the ether value and gas value are sufficient to carry out transactions to the Ethereum blockchain or not, if the price is sufficient, contracts consisting of addresses will be signed using the ECDSA signature and then processing will continue to the Ethereum blockchain, whereas if the price is insufficient then the data transaction will be canceled. • Web3js JSON-RPC At this stage, Web3js JSON-RPC has a role in connecting our system with the Ethereum blockchain to DApp, our system supported by web3js on DApp that works with smart contracts to create consensus validation on the Ethereum blockchain node and integrates with Ethereum blockchain, JSON-RPC web3js bridges the DApp system we created with the Ethereum blockchain network, transactions that meet the criteria are then uploaded on the blockchain, and the smart contract code will compile on the Ethereum blockchain to make consensus agreed by nodes on the Ethereum network, so as to do validation without requiring a third party. Some systems that use Web3js JSON-RPC in data distribution [30,31,32,33,34]. Fig 5. is an overview of the DApp system connected to the JSON-RPC API and then, connected to ethereum via API on the back end. After all steps of the method are implemented the system will notify the display that the diploma and transcript electronic documents have been successfully uploaded to the blockchain, the notification includes the address contract, hash value, and transaction ID.  This verification system starts from stakeholders uploading diploma files and transcripts pdf on the system, then the system sees the fingerprint of the document so that the hash value is obtained, the smart contract sends the hash to the Web3js JSON-RPC connected to the blockchain. If the hash is on the blockchain the system states valid and display valid information, block number, and timestamp so that it is stated that the file is authentic and trustworthy, otherwise if the hash is not present on the blockchain the system returns invalid information and the document is not valid. Fig. 7 is an overview of our system interface for uploading diploma files and transcripts for verification. Fig. 7. Uploading diploma and transcript

Process overview
Ethereum blockchain with its distributed and decentralized nature, in this study the workings of our system are as follows: 1. The university records student information, then the system makes it a pdf file and records the fingerprint of the file (does not record student data information only the fingerprint of the file). 2. The system verifies all documents. 3. University turns over pdf files to students whose fingerprints have been recorded, students do not need to get a diploma because paper diplomas can still be falsified and this makes paper waste, the pdf file contains diplomas and transcripts that have been signed by the chancellor and dean digitally using QR code. 4. When diplomas and transcripts will be used to verify their authenticity whether applying for jobs or continuing education to a higher level, students can simply attach a pdf electronic document to the company or university to be addressed. 5. The company or university will verifies on the system whether the file is valid or fake. Fig. 8 is an overview of our working system illustrated. Flow diagram starts with the administrator creates an electronic diploma document and transcript by entering the student's identity such as name, place of birth, university, rector, grades and others. After that DApp creates QR code from the chancellor and dean, then the document is generated into pdf. The document that has become a pdf is done prepocessing by taking a fingerprint hash with the SHA-256 algorithm and the hash is transacted using smart contract steps, transaction manager, and JSON-RPC. If the transaction is approved by consensus on the blockchain node then the hash value is stored on the Ethereum blockchain, but if it is not appropriate then the transaction is canceled. After that the original document that has been taken hash been given to students via email in the form of a pdf file. If students have obtained diploma documents and transcripts, these documents can be verified and can be given to interested parties. To verify the authenticity of diploma and transcript electronic documents, stakeholders can upload the pdf file to the website-based DApp system that we created, if the file is uploaded the SHA-256 algorithm will check the hash value, after that the hash is delivered by the smart contract, JSON-RPC to search on contract address on the Ethereum blockchain, if the hash value is present and correct then DApp will provide notification information that the file is valid along with the digital signature and block number, but if the hash value is not present on the Ethereum blockchain then the file is declared invalid.

Results and Discussion
The system that we made can be seen in Fig. 9, the case here is only the admin who can make diploma documents and transcripts, and for stakeholders can only verify the authenticity of diplomas and transcripts.
In making diplomas and transcripts, the admin inputting information from students and making the document type pdf, after the document was successfully created the system will generate to get the hash value of the document. The hash will be stored in the struct of the smart contract and uploaded to the blockchain. If the document was uploaded successfully on the public blockchain, system will return that the document has been successfully uploaded.  Fig. 10 the system provides information if the diploma document and transcript are successfully uploaded to Ethereum blockchain and make it decentralized, the information is in the form of hash, transaction id, and contact address. The hash of the document is recorded with the transaction id to the node scattered on the blockchain, this makes it not easily hacked and falsified, because if the hash value is not found on the document blockchain it is false. Fig. 11. Notification verification valid file page In Fig. 11, system detects the original diploma document and transcript, so we get a hash, block number, and timestamp. If the fake file is verified it will show invalid as shown in Fig. 12. The hash value of the document shown in Fig. 12 is not on the blockchain so our system states that the file is not valid and fake. To see the truth that the hash of the document has been uploaded to the blockchain, we can look at Ethereum by looking at transactions from our system's blockchain address as shown in Fig. 13.
In Fig. 13, stating that the hash transaction, status, block number, etc. Indicate that our diploma and transcript files were successfully uploaded on the blockchain, especially on Ethereum declaring success status, if a transaction has been entered on the Ethereum blockchain then the transaction cannot be hacked or deleted, so diploma and transcript data will be safe from falsification of documents. Table  1 are the results of the integrity testing of electronic document diploma and transcripts: In Table 1, based on the results of integrity testing of electronic documents diplomas and transcripts by getting fingerprint values using the SHA-256 hashing algorithm and by using blockchain technology for data storage has very good authentication, seen from the hash file, valid hash, and block number, block number contains that the data stored is in a block in Ethereum which is decentralized and secured with cryptographic hashes but related to each other, the block number always contains the hash of the previous block. If the hash file and valid hash have different values, then the file is not in the block number in Ethereum because the file is invalid, so it will minimize the falsification of electronic documents diplomas and transcripts, and the system has strong security because it uses a decentralized Ethereum blockchain storage technology and secured by cryptography that are interrelated with one another.

Performance
In this section we will see the performance of the 5 files when a transaction is made to the Ethereum blockchain which includes transaction speed, gwei, gas limit, gas cost, total price and transaction hash on Ethereum.
From the results of the evaluation in Table 2, it is found that the transaction process from the trial of the 5 documents is 1 second for each document, this can be considered because it only takes 1 second for each transaction, the total price amount is interrelated with the gas cost and file size, if the size of the file is large, the gas cost becomes large, so that the total price paid increases with the amount of gas cost. The result of the gas limit of each document is in the range 1613249 to 1654428, for that if the gas cost is increased in the range 1613249-1654428 will affect the performance of the file transaction speed which will result in faster transaction speeds and a greater total price. Then the transaction that is successfully accepted to be forwarded by the smart contract to the Ethereum blockchain, will be signed with an ECDSA signature and uploaded to the node on the blockchain with consensus validation, so that the file has a hash value and a digital signature value on Ethereum. We did a test on the file without changing the contents of the file, by compressing the file to be as small as possible with the help of online compression tools to see the integrity of files on the blockchain, the results obtained can be seen in Table 3. The results obtained from Table 3 are if the file is compressed without changing the content and meta data, then the hash will be different from there was before in the Ethereum blockchain, but if the file is only copied and pasted it will not change the hash because the integrity of the file is still maintained and original . So that this can maintain the authenticity of the diploma file and transcript because after making the diploma document and transcript file version and uploaded to the Ethereum blockchain, the file is already a final and original file.

Discussion
The verification process of diploma documents and paper-based transcripts is not discussed in detail in this paper, but there is potential to deal with this problem, in this paper discusses the file-based verification process to overcome the occurrence of false documents and overcome the validation process of documents that previously used fees and use a lot of time.
A way to facilitate the process of verifying files that have been changed without changing the content is to look at the hash of the file on the blockchain, because every time a file is corrupted and meta data changes it will change the hash value of the file, so if the file has been taken and hashed uploaded to Ethereum through smart contract programming, the hash cannot be modified, so if the file is corrupted then the hash of the file will not be registered in Ethereum.
Diploma documents and paper-based transcripts are still being made but for the verification and legalization process using file uploads on a blockchain-based system, so stakeholders who want to verify the authenticity of the diploma simply provide a copy of the electronic file that has been given. For this reason, future studies can discuss paper-based documents digitalized using Optical Character Recognition (OCR) or Barcode techniques. Then the system to apply the off-chain mechanism to the files created, and the on-chain mechanism for transaction records.

Comparative study
In this section is a summary of the diploma verification system using blockchain technology based on related work, the verification system that we propose to verify the authenticity of diplomas and transcripts is to use file uploads on public blockchain Ethereum. Here we summarize in Table 4 regarding the comparative study. Making systems with blockchain technology and verification using student National ID. Cheng et al. [2] Making systems with blockchain technology and verification using certificate search code. Kumavat et al. [9] Making systems with blockchain technology and verification using transaction id. Kumar et al. [7] Making systems with blockchain technology and verification using phone scanner or website with a certificate serial number.
Kumari et al. [15] Making systems with blockchain technoogy and verification using serial number or QR code.

Conclusion
The system can detect the authenticity of electronic documents diplomas and transcripts with file-based Ethereum blockchain technology, so that the file-based verification process can prevent fake diplomas and transcripts and make verification easier, with the file replacing printed versions of diplomas and transcripts so that it will be more economical in spending paper, the results of testing the transaction time from file to Ethereum is 1 second from each file and file integrity indicates if the file is damaged, modified, or the hashed will be different from the original on the Ethereum blockchain, so that the file is detected for authenticity on Ethereum blockchain is a file that has not been modified and falsified, and hashes from the original file stored on a Ethereum. Using a system based on blockchain technology can reduce the falsification of electronic documents, because the process of publishing and verification is done transparently within the system, the system can guarantee the information provided is correct with the right accuracy.