I want my program to learn by itself

I often wonder how I can incorporate advanced learning techniques into my programs.

Currently, I am working on a project that determines whether a statement someone made is “good” or “bad” by using a points measure system that bases “good”/”bad” off of data on the internet. You can find the project here, though it is still a work in progress and very early in the development phase:

https://github.com/kayleoss/moral-test

This process works by extracting words from the statement made, analyzing the connotations of the word based on results from the internet, and calculating “points” based off of a predetermined points-based excel sheet with a list of “good” “bad” words and their respective points (either negative or positive). For example, the word “politics” has a -5 point rating because data surrounding the word politics in a statement largely shows that it is used in a negative manner.

I realize this is oversimplifying and that my excel sheet being the source of truth is a huge flaw in the program because the words are ranked and attributed by me, however I hope to improve this process and change the source of truth in the future, perhaps to a larger 3rd party database that uses better ways of measuring a word’s connotation.

The problem with my project is that it doesn’t learn. It doesn’t come to a comprehensive understanding of what “good” or “bad” is over time. It is simply a points based analysis that may very often be inaccurate.

I want my program to use the internet as a base for its decision, not have it be the only source. So the problem I am stuck with is, how do I get my program to learn? How do I code a program to learn?

For one, it would definitely involve its own database. Currently, the program has none. Perhaps one approach would be to store every decision and rated statement into a database, so the program knows what has been rated already and the rating for previous statements. For example, if someone makes a statement like “I hate you” and the analysis concludes it has a -50 rating, this will be stored in the database.

The next time someone types a similar statement, or a statement that contains any of the words that are stored in the ratings database, the program can calculate differently, based on the rating of the previously rated statement. It should take into consideration the rating of any previous statements, as well as conduct an independent search on the internet and make that calculation as always.

The hardest part of course, would be determining how much weight(emphasis) should be placed on the ratings of similar statements already stored in the database. And how much that should impact the calculation from the independent analysis of internet sources which are always changing. Unless the method is backed by an expensive scientific research study, I feel it could be quite easy to make mistakes on the approach here. As you can see, society has always been conflicted on whether to reference the past, or look to the future. In this case it will be both, but how much of the past and how much of the future?

Either way, I think this might be a way of creating a program that uses a very simple learning technique that will continue to update and change over time as more statements get analyzed.

However, I can’t help but question my approach. Perhaps AI companies are doing something like this- storing data and the decisions made on such data, to be referenced again when new data comes in. They most likely use fancy math operations or probability functions in their code. Or perhaps there is a better approach and they have done something completely different? One can only wonder.