Dabbling in Natural Language Processing using publications by IRRI staff
I first encountered natural language processing (NLP) in data science class , with a classmate's project opening my eyes to the possibilities of using the method. Before that class, I had worked on determining descriptors for rice varieties by finding the most frequently associated adjectives and finding which rice descriptors were co-occurring with viand names. I even published a paper on rice descriptors. Anyway, during class, I decided to embark on a massive (in my opinion when I was starting it) side project to sharpen my skills in NLP. I didn't realise how deep the NLP hole was until I started working on it. Sourcing the documents I remembered that IRRI lists its staff's publications online. It was just a matter of accessing the information and putting them in an Excel spreadsheet by copying and pasting... Wrong! There are over 6000 articles in the list of publications. There was no way I was going to be able to manually copy the information from the websi