Get fitted coefficient of linear regression equation

0 votes

I have a dataset with predicted and observed data. The equation that predicts the data is given by: y = AfT \sqrt{gh}

With Af = constant (now at 1.35), T = wave period, g = gravitation 9.81, h = wave height.

Id like to use linear regression to find the best fitted coefficient (Af in the equation), so that the predicted value is closer to the observed data.

I now have Af = 1.35 (from suggestion in the literature) results in r^2 = 0.5676 Ideally, I`d use python to find the best fitted coefficient for my data.

import statsmodels.formula.api as smf
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

X = np.array([11.52, 11.559, 12.31, 16.46, 11.84, 7.38, 9.99, 16.72, 11.617, 11.77, 6.48, 9.035, 12.87, 11.18, 6.75])
y = np.array([25.51658407, 24.61306145, 19.4007494, 24.85111923, 25.99397106, 14.30284824, 17.69451713, 27.37460301, 22.23326366, 18.44905152, 10.28001306, 10.68681843, 28.85399089, 14.02840557, 18.41941787]).reshape((-1, 1))

X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)

print(clf.coef_, clf.intercept_)

X = observed/measured values in the field, y = the predicted values of X using the equation

I have difficulties incorporating the actual equation and finding the best fit for Af.

Apr 11, 2022 in Machine Learning by Dev
• 6,000 points
827 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You may fit an ordinary least squares model from scikit-learn to the data you gave by taking the log.

import numpy as np
import pymc3 as pm
import pandas as pd
import theano.tensor as tt
from sklearn.linear_model import LinearRegression
import statsmodels.formula.api as smf

train = np.log(np.array([11.52, 11.559, 12.31, 16.46, 11.84, 7.38, 9.99, 16.72, 11.617,
                     11.77, 6.48, 9.035, 12.87, 11.18, 6.75]))
test = np.log(np.array([25.51658407, 24.61306145, 19.4007494, 24.85111923, 
                     25.99397106, 14.30284824, 17.69451713, 27.37460301, 
                     22.23326366, 18.44905152, 10.28001306, 10.68681843, 
                     28.85399089, 14.02840557, 18.41941787]))

reg_model = LinearRegression().fit(train.reshape(-1, 1), test.reshape(-1, 1))
print('Af estimate: ', np.exp(reg_model.intercept_))

This yields the following Af estimate:

Af estimate:  [2.4844087]

You might use statsmodels to get the best estimate of the linear model parameters as you don't seem to be interested in predicting new data with a model.

result = smf.ols('test ~ train + 1', data=pd.DataFrame({'test':test,'train':train})).fit()
print('Statsmodels Af estimate: ', np.exp(result.params['X']))

It yields a result of 2.366, which is very similar to the prior figure. r2 is the same as the one you mentioned.
Finally, my recommendation is to utilize pymc3 to acquire a full bayesian fit, which will allow you to estimate the uncertainty of the number you wish to measure naturally. Although pymc3 has a steep learning curve, it is an excellent library for probabilistic programming. When fitting a model, it allows you to estimate the whole posterior of your parameter space, which is what most people are interested in. The following is an example of a solution to your problem:

with pm.Model() as model: 
    # Prior
    alpha = pm.Normal('alpha', mu=1.35, sd=5) # centered around the literature value
    beta = pm.HalfNormal('beta', sd=10) # only positive values as it goes into the sqrt. Also is height always positive here?
    sigma = pm.HalfNormal("sigma", sd=1) 
    beta2 = pm.Deterministic('beta2', tt.sqrt(beta*9.81)) # g is very well known
    alpha_f = pm.Deterministic('alpha_f', tt.exp(alpha)) # estimate directly the output value we want

    # Likelihood
    likelihood = pm.Normal('y', mu=alpha + beta2 * X,sigma=sigma,observed=y)

    # Samplingtemp
    trace = pm.sample(init='adapt_diag')

print(pm.summary(trace))
          mean     sd  hpd_3%  hpd_97%  ...  ess_sd  ess_bulk  ess_tail  r_hat
alpha    0.781  0.544  -0.232    1.864  ...   309.0     440.0     406.0   1.01
beta     0.091  0.044   0.013    0.167  ...   517.0     438.0     359.0   1.01
sigma    0.259  0.056   0.172    0.368  ...   530.0     479.0     147.0   1.00
beta2    0.917  0.229   0.439    1.316  ...   434.0     438.0     359.0   1.01
alpha_f  2.535  1.552   0.465    5.224  ...   317.0     440.0     406.0   1.01

As you can see, there is a great deal of ambiguity in Af.
However, it is critical to take into account the data that is input and not to overinterpret the outcomes. You don't supply any uncertainty in either y or X, or in the covariance matrix, at the present. However, it is quite rare that you have perfect knowledge of these numbers, thus it is prudent to factor these uncertainties into your calculations. pymc3 makes it possible to do so in a natural way. My implementation offers a data-based estimate of uncertainty, but you may have your own measurement-device-based uncertainty.

answered Apr 14, 2022 by anonymous

edited Mar 5

Related Questions In Machine Learning

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

What is the difference between Coefficient of Regression and Elasticity

It is questionable. I'll simplify the model ...READ MORE

answered Apr 4, 2022 in Machine Learning by Nandini
• 5,480 points
905 views
0 votes
1 answer
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,658 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP