Skip to content

Build elasticsearch index for covid-19 corpus (cosi-132a final project)

Notifications You must be signed in to change notification settings

2ez4szliu/es-index-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

es-index-java

Build elasticsearch index for covid-19 corpus (cosi-132a final project), data of this project are from Kaggle's COVID-19 Open Research Dataset Challenge (CORD-19)

Getting Started

Below are prequisites and instructions for using this project

Prerequisites

Java version 1.8.0

Instrucitons

  1. Download project as zip file or clone this repository
  2. Download all dependencies in the pom.xml file by importing the pom.xml file(Intellij IDEA, etc) or install maven and run command
mvn install
  1. Download all data sets from kaggle website: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
  2. Put all those data files in a folder called data in your project's root directory
  3. In the same root directory, create two folders index-data and processed
  4. Run Parser.main() to preprocess data in all json files
  5. Run MetadataCSVParser.main() to process the metadata.csv file
  6. Run IndexJsonFile.main() to start building es index. (Make sure to start the elasticsearch instance on your local environment)

To be continued...

About

Build elasticsearch index for covid-19 corpus (cosi-132a final project)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages